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Abstract 

We propose robust and efficient algorithms for the joint sparse recovery problem in compressed sensing, which simultaneously 
recover the supports of jointly sparse signals from their multiple measurement vectors obtained through a common sensing matrix. 
In a favorable situation, the unknown matrix, which consists of the jointly sparse signals, has linearly independent nonzero rows. 

O 

^s^j ■ In this case, the MUSIC (Multiple Signal Classification) algorithm, originally proposed by Schmidt for the direction of arrival 

problem in sensor array processing and later proposed and analyzed for joint sparse recovery by Feng and Bresler, provides a 
guarantee with the minimum number of measurements. We focus instead on the unfavorable but practically significant case of 
QO ' rank-defect or ill-conditioning. This situation arises with limited number of measurement vectors, or with highly correlated signal 

components. In this case MUSIC fails, and in practice none of the existing methods can consistently approach the fundamental 
limit. We propose subspace-augmented MUSIC (SA-MUSIC), which improves on MUSIC so that the support is reliably recovered 
under such unfavorable conditions. Combined with subspace-based greedy algorithms also proposed and analyzed in this paper, 
' SA-MUSIC provides a computationally efficient algorithm with a performance guarantee. The performance guarantees are given 

in terms of a version of restricted isometry property. In particular, we also present a non-asymptotic perturbation analysis of the 
signal subspace estimation that has been missing in the previous study of MUSIC. 
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Index Terms 

o 

. Compressed sensing, joint sparsity, multiple measurement vectors (MMV), subspace estimation, restricted isometry property 

^ ■ (RIP), sensor array processing, spectrum-blind sampling. 

O ' 

I. Introduction 

The problem of computing a sparse approximate solution to a linear system has been studied as the subset selection problem 
C3 | in matrix computations {2j with applications in statistical regression [3j and signal processing (4), J5). The matrix representing 
the linear system and its columns are called a dictionary and atoms, respectively. The sparse recovery problem addresses the 
identification of the support, which denotes the indices of the atoms that contribute to the sparse solution, or equivalently, the 
indices of the nonzero rows of the unknown matrix. Once the support is determined, the recovery of the sparse signals reduces 
to standard overdetermined linear inverse problems. 

The study of sparse solutions to underdetermined systems dates back to 70's |6), Q. Relevant theories and algorithms have 
been further developed in the 80's JHJ , J9) and in the 90's iflOl , ifTTI . Ifl2l . Recently, this subject became more popular in 
the signal processing community with the name of compressed sensing |fl3l . In particular, the elegant analysis derived with 
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modern probability theory |[T3l , 1TT41 provided performance guarantees for polynomial-time algorithms in terms of properties 
of random dictionaries. These might be the most important contributions in recent years. 

In some applications, there are multiple measurement vectors (righthand sides of the linear system of equations), each 
corresponding to a different unknown vector, with the special property that all unknown vectors share a common support. 
The sparse recovery problem with this joint structure in the sparsity pattern is called joint sparse recovery or the multiple 
measurement vector (MMV) problem and is often an easier problem with better performance. 

In the mid 1990's, Bresler and Feng introduced "spectrum-blind sampling" |4|, 03), lfl6l . Their scheme enables sub-Nyquist 
minimum-rate sampling and perfect reconstruction of multi-band signals (analog or discrete, in one or more dimensions) with 
unknown but sparse spectral support. They reduced the spectrum-blind reconstruction problem to a finite-dimensional joint 
sparse recovery problem. Mishali and Eldar elaborated the spectrum-blind sampling approach in ifTTl (cf. Ifl8l for a more 
detailed discussion of the relationship of IfTTl to the earlier work from the 1990's J4), fT51 . Ifl6l ). and they also proposed 
"modulated wideband converter" in |fT9"l , which improves on spectrum-blind sampling by adding robustness against jitter. The 
reconstruction in the modulated wideband converter too is reduced to a finite dimensional joint sparse recovery problem. Rao et 
al. {cf. IfTTl and the references therein) introduced a joint sparse recovery formulation and methods for the recovery of sparse 
brain excitations. Obozinski et. al. |f20l formulated variable selection in multivariate regression as a joint sparse recovery 
problem. The design matrix in regression corresponds to the linear system matrix of the joint sparse recovery and the indicator 
function of the variables that mostly contribute to the given data is assumed to be sparse. Malioutov et al. posed the direction 
of arrival (DOA) estimation problem as a joint sparse recovery problem |5|. For the typically small number of sources in this 
problem, the indicator function of the quantized angles is modeled to be sparse. 

Algorithms that exploit the structure in the sparsity pattern have been developed for the joint sparse recovery problem. 
Bresler and Feng proposed to use a version of the MUSIC algorithm 12TI from sensor array processing for the full row rank 
case where the nonzero rows of the unknown matrix have full row rank 0], ifTSll , |fT6l . They also proposed methods based 
on a greedy search inspired by the alternating projections algorithm If22l in DOA estimation, later dubbed orthogonal least 
squares (OLS) [23 j. Existing solutions to the sparse recovery problem for the single measurement vector (SMV) case have 
been extended to the MMV case. Greedy algorithms If24ll . |25l , |26l extend orthogonal matching pursuit (OMP) |27l and 
convex optimization formulations with the mixed norm |5), If28l , |29"1 , 1301 extend the corresponding SMV solution such as 
basis pursuit (BP) (3TJ and LASSO J32). Sparse Bayesian learning (SBL) E3 has been also extended to the MMV case [34]], 
E5J. 

Owing to the similarity between the joint sparse recovery problem and DOA estimation, theories developed for DOA 
estimation affected those for joint sparse recovery. For example, the fundamental limit on the performance of DOA estimation 
l36l also applies to joint sparse recovery. Wax and Ziskind [36| showed the condition for the unique identification of DOA in 
the sensor array processing, which has been applied to joint sparse recovery by Feng and Bresler (4] to determine the condition 
for the unique identification of the support. The condition has been further studied in more general settings 11241 . |29l . MUSIC 
applied to joint sparse recovery B was the first method that was guaranteed with the tightest sufficient condition, which also 
coincides with a necessary condition required for the support identification by any method. However, the guarantee only applies 
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to the case where the nonzero rows of the unknown matrix have full row rank and there is no noise in the measurement vectors. 

Performance guarantees of greedy algorithms and of convex optimization formulations for joint sparse recovery have been 
also studied extensively in the literature E), |25), J29), El, (2§|, ED, J39), |40). The guarantees of such methods have 
not been proved to be strictly better than the guarantees for the SMV case. Moreover, unlike the guarantee of MUSIC, such 
methods are not guaranteed with the minimal requirement for the full row rank case in the absence of noise. 

Performance guarantees aside, the empirical performance and computational cost of any method are of key importance and 
usually determines its adoption in practice. Empirically, the optimization schemes with diversity measures (e.g., the mixed 
norm) perform better than greedy algorithms. In particular, the rate of exact support recovery in existing algorithms does not 
improve with increasing rank of the unknown matrix beyond a certain level. Furthermore, under unfavorable settings such 
as rank-defect or ill-conditioning, existing algorithms for joint sparse recovery, while not failing, are far from achieving the 
guarantee of MUSIC for the full row rank case. 

While the optimization scheme with diversity measures perform better empirically than the greedy algorithms, this improved 
performance comes at a much higher computational cost. Convex optimization formulations Q, l28l . |29l , ll30ll are usually 
cast as second order cone programming (SOCP), which is more difficult to solve compared to its analogues in the SMV case, 
which are cast as linear programming (LP) or quadratic programming (QP). In contrast, greedy algorithms and MUSIC are 
computationally efficient. As a summary, none of the listed methods enjoys both good empirical performance and computational 
speed at the same time. 

In view of the various drawbacks of the existing algorithms for joint sparse recovery, MUSIC, when it works, is extremely 
attractive. In a favorable setting, where the matrix composed of the nonzero rows of the unknown signal matrix has full 
row rank, MUSIC is guaranteed to recover the support and hence provides a guarantee with minimal requirement. Moreover, 
MUSIC is highly efficient computationally. However, the full row rank condition is often violated in practice. For example, 
if the number of measurement vectors N is smaller than the sparsity level s, then no more than N rows can be linearly 
independent, and the nonzero rows do not have full row rank. In other applications, such as spectrum-blind sampling or the 
DOA problem, N is large or even infinite. Even in this case though, the rank might be smaller than s, or the submatrix of 
nonzero rows can be ill-conditioned. For example, in the DOA problem, correlation between sources or multi-path propagation 
can cause a large condition number. It is well-known that MUSIC fails in this practically important "rank-defective" case 
and this has motivated numerous attempts to overcome this problem, without resorting to infeasible multi-dimensional search. 
However, all these previous methods use special structure of the linear system - such as shift invariance that enables to apply 
so-called spatial smoothing BT1 . Previous extension of MUSIC are therefore not applicable to the general joint sparse recovery 
problem. 

The main contributions of this paper are summarized as follows. First, we propose a new class of algorithms, subspace- 
augmented MUSIC (SA-MUSIC) that overcome the limitations of existing algorithms and provide the best of both worlds: 
good empirical performance at all rank conditions; and efficient computation. In particular, SA-MUSIC algorithms improve 
on MUSIC so that the support is recovered even in the case that the unknown matrix has rank-defect and/or ill-conditioning. 
Compared to MUSIC @j, in the presence of a rank-defect, SA-MUSIC has additional steps of partial support recovery and 
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subspace augmentation. Combined with partial support recovery by the subspace-based greedy algorithms also introduced in 
this paper, SA-MUSIC provides a computationally efficient solution to joint sparse recovery with a performance guarantee. 
In fact, the computational requirements of SA-MUSIC algorithms are similar to those of greedy algorithms and of MUSIC. 
Secondly, we derive explicit conditions that guarantee each step of SA-MUSIC for the noisy and/or rank-defective case. The 
performance is analyzed in terms of a property 0421 . which is a local version of the restricted isometry property (RIP) l43ll . 
We call this property the weak-1 restricted isometry property (weak-1 RIP). Most importantly, compared to the relevant work 
l44l with similar but independently developed ideas, the analysis in this paper is non-asymptotic and applies to wider class of 
matrices including Gaussian, random Fourier, and incoherent unit-norm tight frames. 

Contributions of independent interest include two new subspace-based greedy algorithms for joint sparse recovery with 
performance guarantees, extension of the analysis of MUSIC for joint sparse recovery to the noisy case with imperfect 
subspace estimation, and non-asymptotic analysis of subspace estimation from finitely many snapshots. The latter analysis is 
different from previous analysis of subspace methods, which were based on the law of large numbers, asymptotic normality, 
or low order expansion. 

The remainder of this paper is organized as follows. After introducing notations in Section [TTJ the joint sparse recovery 
problem is stated in Section HIT1 MUSIC for joint sparse recovery is reviewed in Section [IV] with discussion that motivates the 
current work. We also propose an algorithm for signal subspace estimation in Section [TV] In Section [V] we propose SA-MUSIC 
and subspace-based greedy algorithms. We review the notion of the weak-1 RIP in Section [VI] The weak-1 RIP analysis of 
various matrices that arise commonly in compressed sensing is given without any ambiguous constant, which might be of 
independent interest. In Section I VIII we provide non-asymptotic analysis of the algorithms of Sections [IV] and [V] by using 
the weak-1 RIP. In Section [Villi we analyze subspace estimation using a random signal model. The empirical performance of 
SA-MUSIC is compared to other methods in Section [IX] and the relation to relevant works is discussed in Section [X] 

II. Notation 

Symbol IK denotes a scalar field, which is either the real field K or the complex field C. The vector space of d-tuples over 
K is denoted by K d for d £ N where N is the set of natural numbers (excluding zero). Similarly, for to, n £ N, the vector 
space of to x ?i matrices over K is denoted by K mXTl . 

We will use various notations on a matrix A £ K mx ™. The range space spanned by the columns of A will be denoted by 
1Z(A). The Hermitian transpose of A will be denoted by A*. The j-th column of A is denoted by aj and the submatrix of 
A with columns indexed by J C [n] is denoted by Aj, where [£] denotes the set {1, . . . ,£} for £ £ N. The fc-th row of A is 
denoted by a k , and the submatrix of A with rows indexed by K C [to] is denoted by A K . Symbol et will denote the fc-th 
standard basis vector of K d , where d is implicitly determined for compatibility. The fc-th largest singular value of A will be 
denoted by <Jk{A). For Hermitian symmetric A, \k(A) will denote the fc-the largest eigenvalue of A. The Frobenius norm and 
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the spectral norm of A are denoted by ||^4||f and \\A\\, respectively. For p,q £ [1, oo], the mixed l v>q norm of A is defined by 



The inner product is denoted by (•,•). The embedding Hilbert space, where the inner product is defined, is not explicitly 
mentioned when it is obvious from the context. 

For a subspace S of K d , matrices Pg £ K dxd and Pg £ ~K dxd denote the orthogonal projectors onto S and its orthogonal 
complement S' ± , respectively. 

For two matrices A and B of the same dimension, A > B if and only if A — B is positive semidefinite. 

Symbols P and E will denote the probability and the expectation with respect to a certain distribution. Unless otherwise 
mentioned, the distribution shall be obvious from the context. 



A vector x £ K n is s-sparse if it has at most s nonzero components. The support of x is defined as the set of the indices 
of nonzero components. The sparsity level of x is defined as the number of nonzero components of x. The sparse recovery 
problem is to reconstruct an s-sparse vector xq £ K" from its linear measurement vector y £ K m through sensing matrix 
A £ K mxn . In particular, if there is no noise in y, then xo is a solution to the linear system Ax = y. In this case, under 
certain conditions on A, unknown vector xo is recovered as the unique s-sparse solution. For example, any s-sparse xo is 
recovered if and only if any 2s columns of A are linearly independent B31 . For xq to be the unique s-sparse solution to 
Ax = y, it is necessary that submatrix Aj have full column rank, where Jo is the support of xq- Otherwise, there exists 
another s-sparse solution and this contradicts uniqueness. Note that once the support of xq is determined from (y,A), then xq 
is easily computed as xq = Ajy where Aj eK sxm denotes the Moore-Penrose pseudo inverse of Aj . Therefore, the key 
step in solving the sparse recovery problem is the identification of the support. 

In practice, the measurement vector y is perturbed by noise. Usually, we assume that y is given by 



with additive noise w £ K m . In this case, xq is no longer a solution to the linear system Ax = y. Instead, minimizing \\y — Ax\\2 
with the sparsity constraint that x is s-sparse provides a solution to the sparse recovery problem. Alternatively, various convex 
optimization methods with sparsity inducing metrics such as the l\ norm have been proposed. However, the solution provided 
by such methods is not exactly s-sparse in the presence of noise. In some applications, the support has important physical 
meaning and hence the identification of the support is explicitly required. For example, in imaging applications of compressed 
sensing the support corresponds to the location of the target object, and in sparse linear regression the most contributing 
variables are identified by the support (cf. J46]). In such applications, unless the solution obtained to the sparse recovery 
problem is exactly s-sparse, a step of thresholding the obtained solution to the nearest s-sparse vector is necessary. For this 
reason, in this paper, the success of the sparse recovery problem is defined as the exact identification of the support of xq. 




III. Problem Statement 



y = Axq + w 



6 



Let us now turn to the main problem of this paper, where there exist multiple sparse signal vectors {xi]f =1 C K n that 
share the same (or similar) sparsity pattern(s) and the measurement vectors {yi}fL 1 C K m are obtained through a common 
sensing matrix A £ K mxn . We assume that the union of the supports of the Xi for i = 1, . . . , JV has at most s elements. Then, 
Xq = [ x ii ■ ■ ■ i x n] G K nxN has at most s nonzero rows and is called row s-sparse. The row support of Xq is defined as the 
set of indices of nonzero rows. The joint sparse recovery problem is to find the row support of the unknown signal matrix Xq 
from the matrix Y £ K mxJV with multiple measurement vectors (MMV) given by 

Y = AX + W 

with common sensing matrix A £ JK rnx " and with perturbation W £ K mxN . Let Jo denote the row support of Xq. Then, AXq 
is compactly rewritten as Aj Xq° where X^° is the matrix composed of the nonzero rows of X , and Aj is the submatrix of 
A with the corresponding columns. The prior knowledge that Xq is row s-sparse will be assumed Q. An important parameter 
in the problem will be the rank of the unknown signal matrix Xq, rank(Xo) = rank(X °), which will be assumed unknown 
as well. When matrix Xq° has full row rank, rank(X / °) assumes its maximum value, rank(X 7 °) = s, and we will refer to 
this as the full row rank case. This is the case preferred by the algorithms in this paper. Otherwise, rank(X /o ) < s, considered 
as violation of the full row rank case, will be called the rank-defective case. 

IV. MUSIC Revisited 

The similarity between the joint sparse recovery (or MMV) problem and the direction of arrival (DOA) estimation problem 
in sensor array processing has been well studied before (e.g. J4)). In particular, it has been shown that the joint sparse recovery 
problem can be regarded as a special case of the DOA problem with discretized angles |4|. Through this analogy between the 
two problems, the algorithms and their analysis developed for the DOA problem have been applied to the joint sparse recovery 
problem J4). In this section, we review a subspace-based algorithm proposed by Feng and Bresler |4), on which our new 
algorithm in Section IVl improves. We also elaborate the subspace-based algorithm in J4] to work without ideal assumptions. 

A. MUSIC for the Joint Sparse Recovery Problem Revisited 

Inspired by the success of the MUSIC algorithm I12TI in sensor array processing, Bresler and Feng (4), |[T6l , |fT31 proposed 
to use a version of MUSIC for joint sparse recovery. As in the original MUSIC algorithm in sensor array processing ETI . the 
first step is to estimate the so-called signal subspace S defined by 

S^K(AX Q )=K(A Jn X Q J °) 

from the snapshot matri^ Y = A Jo X^° + W £ K mxN . In sensor array processing, MUSIC [21] estimates S using the 
eigenvalue decomposition (EVD) of . The same method is applied to the joint sparse recovery problem under the assumption 

1 This is only for convenience of the analysis but is not a limitation of the proposed algorithms. See lV-Bl for more detailed discussion. 

2 We adopt the terminology from the sensor array processing literature. To emphasize the analogy between the joint sparse recovery problem and DOA 
estimation, we also call each of the N columns of Y a snapshot. Then, N will denote the number of snapshots. 
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that 

YY* A. Io X^(X J rA% 

— = + <r w I, (4-D 

which is achieved with statistical assumptions on Aj Xq° and W under the asymptotic in N (with infinitely many snapshots) 
03] ■ It is also assumed that Aj £ K mxs has full column rank and Xq° £ WL sxN has full row rank. Then, the dimension 
of S coincides with rank(X^°) = s. The assumed relation ( 14. It implies that the smallest eigenvalue of has multiplicity 
m — s, whence the dimension of S is exactly determined. The signal subspace S is then exactly computed as the invariant 
subspace spanned by the s dominant eigenvectors of since the noise part cr^J in ( 14.11 ) only shifts the eigenvalues of 

Aj X J ° (X J ° )*A* 

o __o f he joint sparse recovery problem then reduces to the case where the subspace estimation is error-free and 



N 



hence the subsequent analysis of the support recovery in (4), lfl6l . lfT31 considered this error-free casein 

Given the signal subspace S, MUSIC for joint sparse recovery identifies the row-support Jo as the set of the indices k of 
columns of A such that PgCik = 0. In other words, MUSIC accepts the index fc as an element of the support if £ S and 
rejects it otherwise. In the remainder of this paper, the acronym MUSIC will denote the version for joint sparse recovery |4| 
rather than the original MUSIC algorithm for sensor array processing |2~T1 . 

A sufficient condition that guarantees the success of MUSIC for the special case where Xq° has full row rank, is given in 
terms of the Kruskal rank ll47l defined below. 

Definition 4.1: The Kruskal rank of a matrix A, denoted by krank(A), is the maximal number fc such that any k columns 
of A are linearly independent. 

Proposition 4.2 (^j, ^75l/): Let J be an arbitrary subset of [n] with s elements. Suppose that X £ K nxN is row s-sparse 
with support J and rank(X /o ) = s. If A satisfies 

krank(A) = s, (4.2) 

then 

Ps a k = (4.3) 

for S = TZ(Aj X^ ) if and only if k e J . 

The following result directly follows from Proposition 14.21 

Corollary 4.3: Under the conditions on A and Xq° in Proposition 14.21 given the exact signal subspace S, MUSIC is 
guaranteed to recover Jo. 

Remark 4.4: In |fT51 , Condition (14.2| > was stated in terms of the "universality level" of A, which is in fact identical to 
krank(A). A similar notion called the "spark" of A was later introduced in B31 . which is related to the Kruskal rank by 

spark(A) = krank(y4) + 1. 

Remark 4.5: Condition (14. 2\ is satisfied by certain A £ K mxn with m > s. For example, it has been shown that the matrix 
composed of any consecutive m rows of the n x n DFT matrix, which corresponds to the "bunched sampling pattern" in 
spectrum-blind sampling, satisfies (14. 2\ when m > s [15 J. 

3 Obviously, for the noiseless case, the subspace estimation is error-free without relying on the asymptotic N. 
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MUSIC with its performance guarantee in Proposition 14.21 is remarkable in the context of compressive sensing for the 
following reasons. First, MUSIC is a polynomial-time algorithm with a performance guarantee under a condition that coincides 
with the necessary condition m > s for unique recovery (by any algorithm, no matter how complex) Q. Second, MUSIC is 
simple and cheap, involving little more than a single EVD of the data covariance matrix. In fact, efficient methods for partial 
EVD, or other rank-revealing decompositions can further reduce the cost 0. 

Unfortunately, as is well-known in the sensor array processing literature |2D . BP . and also demonstrated by numerical 
experiments later in Section [IX] MUSIC is prone to failure when Xq° does not have full row rank, or when it is ill-conditioned 
in the presence of noise. In sensor array processing, this is known as the "source coherence problem" l52l . and (with the 
exception of the case of a Vandermonde matrix A) no general solution to this problem are known. This motivates our work to 
propose a new subspace-based method for joint sparse recovery that improves on MUSIC. 

For the noisy case, the analysis of signal subspace estimation based on the asymptotic in N is not practical. In particular, 
from a perspective of compressed sensing (with joint sparsity), recovery of the support from a finite (often small) number of 
snapshots is desired. In the next subsection, we propose a subspace estimation scheme that works with finitely many snapshots, 
which will be applied to both MUSIC and the new subspace-based methods in this paper. In Section HV-CI we formalize the 
MUSIC algorithm for support recovery in the presence of a perturbation in the estimate of S. This will lay the ground for the 
subsequent analysis of MUSIC in the noisy scenario, and for its extension in the same scenario to SA-MUSIC. 

B. Signal Subspace Estimation from Finitely Many Snapshots 

We study the problem of signal subspace estimation from finitely many snapshots in this subsection. For later use in other 
subspace-based algorithms in Section [V] we weaken the assumptions in the previous section. In particular, we assume finite TV 
and no longer assume that has full row rank. We also assume that the columns of the noise matrix W are i.i.d. random 
vectors with white spectrum, i.e., = a\l m . (Otherwise, the standard pre- whitening schemes developed in sensor array 

processing may be applied prior to subspace estimation.) 

When Aj Xq" is ill-conditioned, the last few singular values of Aj Xq° are small, making the estimation of S = 1Z(Aj Xq°) 
highly sensitive to the noise in the snapshots. To improve the robustness against noise, instead of estimating the whole subspace 
S, we only estimate an r-dimensional subspace of S for r < s. The dimension r is determined so that the gap between 
a r (Aj XQ°) and a r+ i(Aj XQ°) is significant. 

We propose and analyze a simple scheme that determines the dimension r by thresholding the eigenvalues of so that 
the estimated signal subspace S spanned by the r dominant eigenvectors of is close to an r-dimensional subspace of S. 
The procedure for estimating the signal subspace and its dimension is described next. 

Given the snapshot matrix Y = Aj Xq° + W, we compute its sample covariance matrix Ty defined by 

r a YY* 
Ty = —- 

4 Since the mid 1990's, when these results (for what became known later as compressive sampling) (4), 1161 , 1151 were published, until recently, MUSIC 
was the only polynomial-time algorithm with such a guarantee. Recent results 1481 showed another (greedy) algorithm with the same guarantee. 

5 The rank revealing decompositions can be computed by the Lanczos method (2] for the truncated singular value decomposition (SVD). If the matrix is 
large, randomized algorithms 1491 . 1501 can be also used to compute the truncated SVD. 



Then, we compute a biased matrix T by 

r = Ty — X m (Ty)I„ 



Note that T and Ty have the same eigenvectors. Recall that our goal is to find an r-dimensional subspace S for some r < s 



from r such that there exists an r-dimensional subspace 5* of the signal subspace S = 1Z(Aj Xq°) satisfying ||-Pg-— i^|| < i] 



for small 77. For better performance of support recovery by algorithms in Section [Vj larger r is preferred. 
For an ideal case where ( 14. It holds, T reduces to T$ defined by 

Ts = N ■ 

Since S = TZ(Ts), by setting r to rank(rg), we compute S itself rather than a proper subspace of S. 

For finite N, usually, the cross correlation terms in the sample covariance matrix Ty between the noise term W and the 
signal term Aj Xq° are smaller than the autocorrelation terms. Since the autocorrelation term of W is nearly removed in F, 
we will show that it is likely that D = T — Ts is small in the spectral norm. In particular, let S be the subspace spanned 
by the r dominant eigenvectors of T; if \\D\\ is small compared to the gap between A r (r) and A r +i(r), then there exists an 
r-dimensional subspace S of S with small \\Pg — P§\\. Since \\D\\ is not available, we determine the dimension r by simple 
thresholding. More specifically, given a threshold r > 0, the dimension of S is determined as the maximal number r satisfying 

M?)-A r+1 (f) >r> A fc (f)-A fc+1 (f) Vfc> ^ 
Ai(r) Ai(T) 

When Aj Xq° (and hence Ts) is ill-conditioned, it is likely that the gap between the consecutive eigenvalues A ro (r) and 
A ro +i(r) where ro = rank(rs) is small compared to ||D||, which is roughly proportional to cr^/Ax(r). The aforementioned 
increased robustness to noise is provided by choosing r < ro so that the gap X r (T) — A r +i(r) is large., which will in turn 
reduce the estimated subspace dimension r. More sophisticated methods for determining r are possible, but this simple method 
suffices for our purposes and is amenable to analysis (see Section [VHB . This algorithm for estimating the signal subspace and 
its dimension is summarized as Algorithm Q] 

Algorithm 1 Signal Subspace Estimation 

Input: Y € K mxN and r > 0. 
Output: reN and P § e K mxm 

YY* . 



Y 



N 



V <—Ty — X m (Ty)I m ; 

r <— m — 1; 

while A r (f) - A r+ i(f) < rAi(f) do 

r <— 1 — 1; 
end while 

U 4— r dominant eigenvectors of T; 
P § i- UU*; 
return r, P§; 



10 

C. MUSIC Applied to an Inaccurate Estimate of the Signal Subspace 

In the presence of a perturbation in the estimated signal subspace S, MUSIC finds the set J of s indices that satisfy 

. \\P§akh „ \\P§^h .... 

mm — > max — . (4.5) 

keJ \\ a k\\2 ke[n]\j M a-fe M 2 

The corresponding algorithm is summarized in Algorithm |2] To provide an intuition for the selection criterion in (14.51 ). we use 
the notion of the "angle function" l53ll . 

Definition 4.6 (071/): The angle function between two subspaces Si and 5 2 is defined by 

<2(5i,^ 2 ) = sin- 1 (min{||Pi- i P5 2 ||, P^sJ}) ■ (4-6) 

Remark 4.7: The angle function <2(Si, S3) is different from the largest principal angle between Si and S2 0. Unlike the 
largest principal angle, the angle function satisfies the metric properties even when Si and S2 have different dimensions. In 
particular, when dim(5i) > dimfS^), the expression on the right hand side of (14.61 l reduces to 

< 2 (5 1 ,5 2 )=sin- 1 (||P^ i P S2 ||). (4.7) 

By ( 14.71 l. the criterion in ( |4.51 l is equivalent to 

max < 2 (S,K(a k )) < min < 2 (S,Tl(a k )). (4.8) 

ke.J fee[n]\j 

That is, MUSIC finds, among all subspaces spanned by a single column of A, s subspaces nearest to S (in the angle function 
metric). 



Algorithm 2 MUSIC 

Input: Y e K mxN , A e K mxn , seN. 
Output: J c [n] 

1: € W nxm , r£Nf- estimate signal subspace from Y; 
2: J <- 0; 

3: for £ = 1, . . . ,n do 

4: (( i- |jP_ga£|j 2 /||a f |j2 

5: end for 

6: J <— indices of the s-largest ^'s; 
7: return J 



V. SUBSPACE-AUGMENTED MUSIC 

A. MUSIC Applied to an Augmented Signal Subspace 

When has full row rank, the signal subspace S = 1Z(Aj Xq°) coincides with 1Z(Aj ). In this case, given the exact S, 
MUSIC is guaranteed to recover the support Jo because (i) a k £ S for all k £ Jo; and (ii) a k S for all fee [n] \ Jo, which 
is implied by krank(A) = s + 1 (Proposition 14.2b . However, in the rank-defective case, when Xq° does not have full row 
rank, i.e., when rank(X 7 °) is strictly smaller than the sparsity level s, we have dim(S') < rank(AT /o ) < s = dmi(lZ(Aj a ). 
Therefore, S is a proper subspace of lZ(Aj a ) and it may happen that a k S for some (or all) k £ Jq. This will cause MUSIC 
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to miss valid components of Jq. Because, in the presence of noise (imperfect subspace estimate), MUSIC selects, by ( 14.81 ), 
the s indices k £ [n] for which TZ(ak) is closets (in the sense of the angle function) to S, this may result in the selection of 
spurious indices into the estimate of Jq. This explains the well-known fact that in the rank-defective case MUSIC is prone to 
failure. 

Subspace-augmented MUSIC overcomes the limitation of MUSIC to the full row rank case by capitalizing on the following 
observation: MUSIC fails in the rank-defective case because S is a proper subspace of TZ(Aj ); however, if another subspace 
T that complements S is given so that S + T = 1Z(Aj ), then MUSIC applied to the augmented subspace S + T will be 
successful. 

Unfortunately, in general, finding such an oracle subspace is not feasible. The search procedure cannot even be enumerated. 
However, if Xq°, the matrix of nonzero rows of Xq, or more generally, the subspace 1Z(Xq"), satisfies a mild condition, then 
the search may be restricted without loss of generality to subspaces spanned by s — r columns of A. The following proposition 
states this result. 

Definition 5.1: Matrix X is row-nondegenerate if 

krank(X*) = rank(X). (5.1) 

Remark 5.2: Condition (15. It says that every k rows of X are linearly independent for k < rank(X). This is satisfied by X 
that is generic in the set of full rank matrices of the same size. In fact, an even weaker requirement on X suffices, as shown 
by the next argument that reduces the requirement to JZ(X). 

Remark 5.3: Condition ( 15. Q is invariant to multiplication of X by any full row rank matrix of compatible size on the right. 
In particular, Condition (15. U holds if and only if 

krank(Q*) = rank(X) (5.2) 

for any orthonormal basis Q of 7Z(X). It follows that ( 15. U is a property of the subspace TZ(X). Furthermore, d5.11 l also implies 
that any Q such that 1Z(Q) C Ti-(X) is also row-nondegenerate (Lemma |A.4t . 

Remark 5.4: The condition on Q in (15. 2\ that any subset of rows of Q up to size rank(X) are linearly independent is 
purely algebraic and can be paraphrased to say that the rows of Q are in general position. This is a mild condition satisfied by 
generic Q. For example, if the rows of Q are independently distributed with respect to any absolutely continuous probability 
measure, then ( 15. 2\ is satisfied with probability 1. 

Remark 5.5: Remarks 15.2145.41 validate the definition of Condition 15. II as a row-nondegeneracy condition. 

Proposition 5.6 (Subspace Augmentation): Suppose that Xq £ ~K nxN is row s-sparse with support Jq c [n] and Xq" is 
row-nondegenerate. Let S be an arbitrary r-dimensional subspace of 1Z(Aj X j °) where r < s. Let Ji be an arbitrary subset 
of Jo with s — r elements. If Aj has full column rank, then 

s + n(A. h )=n(Aj ). (5.3) 
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Proof: See Appendix iBl ■ 

Remark 5.7: Note that dim(TZ(Aj 1 )) > dim(lZ(Aj )) — dim(5) = s — r is a necessary condition for ( 15.3b . Therefore, J\ 
should be a subset of Jo with at least s — r elements for the success of the subspace augmentation. 

Remark 5.8: The row-nondegeneracy condition on Xq° is a necessary condition to guarantee ( 15.3l l for an arbitrary subset J\ 
of Jo with s—r elements. Suppose that Xq" fails to satisfy the row-nondegeneracy condition, i.e., krank((X Jo )*) < rank(X J °). 
By the assumption on S, there exists a row s-sparse matrix U £ K" xr with support Jo such that S = lZ(Aj U Ja ). Since S 
was an arbitrary r-dimensional subspace of S, without loss of generality, we may assume that krank((£/ ,/o )*) < r. By the 
projection update formula, it follows that P-k(a h )+s = Pr(A , x ) Jr Pp ± {A ) 3 anc ^ nence it suffices to show dim(P^ A ? ^S) < r 
for the failure of ( 15. 31 . Since krank(([/ Jo )*) < r, there exists Ji C Jo of size s — r such that rank([/' /o \ Jl ) < r. Then, 
dim(P^ (A/i) 5) = rank(P^ (A/i) A. /o [/ J o) = rank(P^ (A/i) A JoVl ?7 J o\ Jl ) < rank([/ J «\ J i) < r . It follows that (El fails 
for this specific 3^. Therefore, the row-nondegeneracy condition on Xq" is essential. Furthermore, by Remarks I5.2H5.5I the 
row-nondegeneracy is a mild condition; it will be assumed to hold henceforth. 

Let X^° be row-nondegenerate and suppose that an error-free estimate of S is available. In this case, Proposition 15 .61 implies 
that, given a correct partial support Ji of size s — r, MUSIC applied to the augmented subspace S + TZ(Aj 1 ) enjoys the same 
guarantee as MUSIC for the full row rank case (Proposition 14.2b . We will see in Section [VTT1 that a similar statement applies 
even with an imperfect estimate S. 

Based on the above result, we propose a class of methods for joint sparse recovery called subspace-augmented MUSIC 
(SA-MUSIC) consisting of the following steps: 

1) Signal subspace estimation: compute an estimate S of the signal subspace S = 1Z(Aj Xq°) = IZ(AXq). 

2) Partial support recovery: compute a partial support J\ of size s — r from S and A, where r = dim (5) and s is the sparsity 
level known a priori. 

3) Augment signal subspace: compute the augmented subspace S 

S = S + TZ(A Jl ). 

4) Support completion: complete Ji to produce Jo D Ji, by adding r more support elements obtained by applying "MUSIC" 
to S, that is, finding J C [n] satisfying 

. \\Pgakh ^ \\Ps a kh 
mm — — > max — — . 

fceJVi Il a fcll2 ke[n]\j || «fe M 2 

The general SA-MUSIC algorithm is summarized as Algorithm [3] An actual implementation might use orthonormal bases Q 
and Q for the subspaces S and S instead of constructing the projection operators P§ and Pg. Step [3] could then be performed 
by a QR decomposition of matrix [Q, Aj^\. 

A particular instance of the SA-MUSIC algorithm is specified by the particular methods used for the steps of signal subspace 
estimation and partial support recovery. For subspace estimation, in the analysis in Sections IVIII and [Villi we will consider the 
EVD-based scheme in Section IIV-BI However, as mentioned earlier, the subspace estimation scheme is not restricted to the 
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Algorithm 3 Subspace Augmented MUSIC 

Input: Y € K mxN , A e K mxn , seN. 
Output: J C [n] 

1: G ]R" iXm , reNf- estimate signal subspace from Y; 

2: Ji i— partial support recovery of size s — r; 

3: P^Pg + iP^Aj^Aj^; 

4: for I € [n] \ Ji do 

5: Q <- \\Pgai\\2/\\ath 

6: end for 

7: J <— Ji U {indices of the r-largest £/s}; 
8: return J 

given method. For example, if the noise W is also sparse, then robust principal component analysis (RPCA) l54l will provide 
a better estimate of S than the usual SVD. 

The choice of method for partial support recovery is discussed in the next subsection. Here we note some special cases. 
As r increases, the size of the partial support required in SA-MUSIC decreases. For small s — r, we can use an exhaustive 
search over J\, the computational cost of which also decreases in r. In particular, for the special case where r = s, the step 
of partial support recovery is eliminated, and SA-MUSIC reduces to MUSIC B). 

B. Partial Support Recovery with Practical Algorithms 

When there is a "rank-defect" in X J °, i.e., rank(X J °) < s, SA-MUSIC requires partial support recovery of size s — r. In 
addition to computational efficiency, a key desirable property of an algorithm to accomplish this is that it solves the partial 
support recovery problem more easily than solving the full support recovery problem. 

From this perspective, greedy algorithms for the joint sparse recovery problem are attractive candidates. Both empirical 
observations and the performance guarantees in the sequel suggest that the first few steps of greedy algorithms are more likely 
to succeed than the entire greedy algorithms. In other words, greedy algorithms take advantage of the reduction to partial 
support recovery when they are combined with SA-MUSIC. 

Any of the known greedy algorithms for joint sparse recovery may be used in SA-MUSIC, producing a different version of 
SA-MUSIC. In particular, we may consider variations on orthogonal matching pursuit (OMP) l27l . such as MMV orthogonal 
matching pursuit (M-OMP) l24l . simultaneous orthogonal matching pursuit (S-OMP) |25|, or their generalization to p-SOMP 
l26l . The p-SOMP algorithm incrementally updates the support by the following selection rule: given an index set J from the 
previous steps, the algorithm adds to J the index k that satisfies 

k = arg max \\Y*P^, A } ai\\ p (5.4) 

where p £ [1, oo] is a parameter of the algorithm. M-OMP and S-OMP correspond to 2-SOMP and 1-SOMP, respectively. 

In particular, we propose and analyze two different algorithms for the partial support recovery step in SA-MUSIC. The first 
is the signal subspace orthogonal matching pursuit (SS-OMP) algorithm. SS-OMP is a subspace-based variation of M-OMP 
(2-SOMP) that replaces the snapshot matrix Y in ( 15. 41 by the orthogonal projector P§ onto the estimated signal subspace. 
(Equivalently, Y is replaced by an orthogonal basis matrix for the estimated signal subspace). Hence, given the estimated 
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support J from the previous steps, SS-OMP updates J by adding k selected by 

k = arg max \\P § P£, A )0^|| 2 . (5.5) 
The complete SS-OMP is summarized as Algorithm |4] 



Algorithm 4 Signal Subspace Orthogonal Matching Pursuit (SS-OMP) 

Input: Y G K mxAr , A € K mxn , s G N. 
Output: J C [n] 

1: G R mxm , reNf- estimate signal subspace from Y; 
2: J^0; 

3: while | J | < s do 

4: fc <- arg max \\P S P^ (A ^ath; 

5: J^JU{k}; 
6: end while 
7: return J 



The second algorithm we propose for the partial support recovery step in SA-MUSIC is signal subspace orthogonal matching 
subspace pursuit (SS-OMSP). SS-OMSP is a modification of another greedy algorithm, rank-aware order recursive matching 
pursuit (RA-ORMP) [48 1 Q SS-OMSP replaces the snapshot matrix Y in RA-ORMP by the orthogonal projector P s onto the 
estimated signal subspace, or equivalently, by an orthogonal basis matrix for the estimated signal subspace. Given the estimated 
support J from the previous steps, RA-ORMP updates J by adding k selected by 

fc = arg max \\(P R{P ± ,Y))MW\\ p n(Aj) a eh- ( 5 - 6 ) 
Similarly, SS-OMSP updates the support by adding k selected by 

fc = arg max || (P pX §) a th/W P n(Aj) a eh- (5-7) 

lG[n\\J K(ij) v 



In general, P^, Aj ^Y and P^, Aj ^Pg span different subspaces, and hence the two projection operators used in ( 15.6b and (15.7b 
are different. The two projection operators coincide only for the special case when there is no noise. 
We interpret ( 15.71 ) using the angle function between two subspaces, i.e., (|5.71 i is equivalent to 

k = arg min < 2 (P£ (Aj) S , P£ (Aj) K(a e )) . (5.8) 

Given the subspace TZ(Aj), which is spanned by the columns of A corresponding to support elements J determined in the 
preceding steps of the algorithm, the selection rule finds the nearest subspace to P^ Aj ^S among all subspaces, each of which 
is spanned by Pj£, A sCte for £ G [n] \ J. The name "orthogonal matching subspace pursuit" is intended to distinguish the 
matching using a subspace metric in SS-OMSP from that of OMP and its variations. 

Again, we use a partial run (s — r steps) of SS-OMSP for partial support recovery and switch to MUSIC applied to the 

6 The name "Rank-Aware ORMP" proposed for this algorithm appears to be a misnomer. RA-ORMP, as originally proposed |48| , does not have any feature 
to determine rank. It computes an orfhonormal basis for Y. However, whereas in the ideal, noiseless case this basis will have dimension r equal to the rank 
of Xq° , with any noise present Y will have full rank equal to min{m, N}, and this will also be the dimension of the computed orthonormal basis. Hence, 
the algorithm does not seem to have any built-in rank-awareness. 
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augmented subspace. The complete SS-OMSP is summarized as Algorithm [5j 



Algorithm 5 Signal Subspace Orthogonal Matching Subspace Pursuit (SS-OMSP) 

Input: Y € K mxN , A e K mxn , s 6 N. 
Output: J c [n] 

1: Pg €E K mxm , r£Nf- estimate signal subspace from Y; 
2: J^0; 

3: while I J| < s do 

4: fc-<-arg max || (P pX §) a eh/\\ P ii(Aj) a e\W> 

5: J^JU{fc}; 
6: end while 
7: return J 



C. Stopping Conditions for Unknown Sparsity Level 

In most analyses of greedy algorithms, the sparsity level s is assumed to be known a priori. In fact, this assumption is 
only for simplicity of the analysis and not a limitation of the greedy algorithms. For example, M-OMP recovers a support 
of unknown size by running the steps until the residual \\P A Y\\p falls below a certain threshold that depends on the noise 
level (or vanishes, in the noiseless case). SS-OMP recovers a support of unknown size similarly but a different criterion may 
be used for the stopping condition. Assume that the estimated signal subspace S satisfies dim(S') < dim(S'). After k steps, 
SS-OMP returns J of size k. Whether to continue to the next step is determined by the stopping condition given by 

< 2 (S,K(A J )) = \\P£ (A „ ) P § \\<r 1 

for a threshold rj. For the noiseless case, rj is set to and, for the noisy case, 77 is set to an estimate of <2(<5 f , 1Z(Aj )). Let 
us consider the solution Jns given by the enumeration of all possible supports: 

Jns = min |J| 

JC[n],\J\>r 

subject to < 2 (S,Tl(Aj)) < rj. 

We assume that | Jns| = s - Otherwise, in the noiseless case, this implies that the support is not uniquely determined. If 
\Jns \ = s, then SS-OMP stops after s steps whenever it is guaranteed to recover the support with known s. 

Similarly, SA-MUSIC with SS-OMP (SA-MUSIC+SS-OMP henceforth) can recover the support without knowledge of s 
by applying the same stopping criterion for SS-OMP, which is summarized in Algorithm [6] The update criterion in Step [9] 
of Algorithm |6] determines the SA-MUSIC algorithm. For example, SA-MUSIC+SS-OMP uses the condition in ( 15.5b whereas 
SA-MUSIC with SS-OMSP (SA-MUSIC+SS-OMSP henceforth) uses the condition in ( 15771 . If |J NS | = s, then SA-MUSIC 
algorithms return support of size s whenever they are guaranteed to recover the support with known s. For simplicity, the 
analyses in Section IVIII will assume known sparsity level. 
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Algorithm 6 SA-MUSIC for unknown s 

Input: Y G K mxN , A G K mx ™, 77 > 0. 
Output: J C [n] 

1: Pg G M. mxm , r£Mf- estimate signal subspace from Y; 

2: J^0; 

3: for £ = 1, . . . , n do 

4: & <~ || -Pg^ lb/ 1| ^ || 2 

5: end for 

6: J <h- indices of the r-largest Cfe's; 

7: Ji «- 0; 

8: while \\Pkc Aj ) P s\\ > V d0 

9: Select fc by an update criterion; 

10: Ji <- Ji U {/c}; 

ii: P2<-P3+(i^A Jl )(J^4 Jl )t ; 

12: for £ g [n] \ Ji do 

13: Q i- ||P^|| 2 /||a^||2 

14: end for 

15: J <— J\ U {indices of the r-largest Q's}; 

16: end while 

17: return J 



VI. Weak Restricted Isometry Property 

A. Uniform Restricted Isometry Property 

The restricted isometry property (RIP) has been proposed in the study of the reconstruction of sparse vectors by l\ norm 
minimization ||431 . Matrix A G K mXTl satisfies the RIP of order s if there exists a constant S G (0, 1) such that 

(1- 5)\\xf 2 <\\Axf 2 <(l + S)\\x\\l Vx, Wo < s. (6.1) 

The smallest 5 that satisfies (16. U is called the restricted isometry constant (RIC) of order s and is denoted by 5 S (A). Note 
that (16. U is equivalent to 

(1 - 5)1. < A*jAj < (1 + (5)/,, VJ c [n], | J| = s (6.2) 

and hence <5 S (A) satisfies 

6 S (A) = max\\A*jAj - I 3 \\. 

|7|=s 

The RIP of order s implies that all submatrices of A with s columns are uniformly well conditioned. 

B. Weak Restricted Isometry Property 

In many analyses of sparse signal recovery, the uniform RIP is unnecessarily strong and requires a demanding condition on 
5 S {A) that is not satisfied by the matrices that arise in applications. Therefore, weaker versions of RIP have been proposed, 
tailored to specific analyses 11421 . Il55l . 

Matrix A G K mx ™ satisfies the weak restricted isometry property (weak RIP) ||55l with parameter (J, s, t, S), where s, t G N, 
J c [n] with | J | = s, and S G (0, 1), if 

(1 - S)I s+t < A* K A K < (1 + 6)I s+t , VK D J, \K\ =s + t. (6.3) 
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The corresponding weak restricted isometry constant is given by 



6™f(A;J)= max \\A* K A K - I s+t \\. 

\K\ = s+t 

The special case of the weak RIP with t = 1 has been previously proposed 11421 to derive an average case analysis of the 
solution of the MMV problem by the mixed ^2,1 norm minimization, also known as MMV basis pursuit (M-BP) |29l . This 
specific case of the weak RIP with t = 1, which we call the weak-1 RIP, is useful for the analysis in this paper. Obviously, 
the weak-1 RIP is satisfied by a less stringent condition on A. In the following, we list some matrices that satisfy the weak-1 
RIP along with the required conditions Q. Importantly, in addition to Gaussian matrices, which lend themselves to relatively 
easy analysis, this list includes deterministic matrices that arise in applications, and provides reasonable constants for them. 

C. Gaussian Matrix 

Eldar and Rauhut derived a condition for the weak-1 RIP of an i.i.d. Gaussian matrix [42 Proposition 5.3]. Their proof starts 
with the concentration of the quadratic form ||(ja;||| around its expectation where G is an i.i.d. Gaussian matrix, to bound the 
singular values of G, which has been originally proposed in 1551 . We provide an alternative and much tighter condition for 
the weak-1 RIP of A directly using the concentration of the singular values of an i.i.d. Gaussian matrix 1571 . 

Proposition 6.1: Given m, n G N with m < n, let A g R mxn b e a n i.i.d. Gaussian matrix whose entries follow Af(0, — ). 
Suppose that J is a subset of [n] with s elements. For any e,S E (0, 1), if 



VsTT+ J 2 In 



2{n—s) 



then, 



m > — = — , (6.4) 

VT+S-1 



^f(A; J)>5)< e, (6.5) 



where the probability is with respect to A. 

Proof: See Appendix ICl ■ 

Remark 6.2: For large s such that the log term is negligible, Condition ( 16.41 ) reduces to m/s > (\/l + S — 1)~ 2 . For 8 f» 1, 
the oversampling factor is approximately 6. 

Remark 6.3: Using the concavity of the square root function, it follows that 

m > , * L + 21n(^^)+ll, (6.6) 



(VT+5-1) 2 v 

is a sufficient condition for ( 16.41 ) and hence also guarantees d6.5t . 

Remark 6.4: By slightly modifying the proof of Proposition 16. U we obtain the uniform RIP of A: if 



m > 



(VTT5~i) 2 



3 + In 



2 s 

s + 21n( - I +1 J>, (6.7) 



7 We show that, compared to the uniform RIP, the requirement on the number of measurements to satisfy the weak-1 RIP is reduced by large factors, 
ranging between 200 to thousands fold. 
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then 

V(S S (A) >S)<e, 

where the probability is with respect to A. Compared to Condition ( 16.6b required for the weak-1 RIP, in Condition ( 16.71 for 
the uniform RIP, the required oversampling factor — has been increased roughly by the factor 3 + 2 In (— ). 

We also consider variations of the asymmetric RIP |58l , ||59l . Similarly to the weak-1 RIP, A satisfies the weak-1 asymmetric 
RIP if there exist a, (3 > such that 

a < a s+1 (A JU{j} ) < <n (A JU{j} ) < 0, Vj G [n] \ J. 

The corresponding weak-1 asymmetric RICs are defined as follows: 

a$$(A;J)± mm a s+1 (A JU{j} ), 

j£[n]\J 

fi#(A;J)± max ^(A/um)- 

j£[n]\J 

Proposition 6.5: Given m, n € N with m < n, let A G R mxra be an i.i.d. Gaussian matrix whose entries follow Af(0, — ). 
Suppose that J is a subset of [n] with s elements. For any £,76 (0, 1), if 



m > i , (6.8) 

7 

then, 

P(K;?(A; J) < 1 - 7] V [^(A; J) > 1 + 7]) < e, (6.9) 

where the probability is with respect to A. 

Proof: See Appendix ID] ■ 

Remark 6.6: For large s such that the log term is negligible, Condition ( 16.81 ) reduces to ^ > J^. For 7 ~ 1, the oversampling 
factor is approximately 1. 

Remark 6. 7: Using the concavity of the square root function, it follows that 

m>Aj s + 21n (fciP) +1 j j (6 .10) 

is a sufficient condition for ( 16.81 ) and hence also guarantees d6.9t . 

Remark 6.8: Proposition 16.51 provides a sufficient condition, which is not necessarily tightest, in particular, in the limit when 
7 — > 1. Indeed, an i.i.d. Gaussian A satisfies aJS 4 (A; J) > with probability 1 if m > s, but Condition ( 16.81 ) does not 
converge to m > s as 5 approaches 1. Nevertheless, this gap vanishes if s goes to infinity with n = o(e s ), that is, s grows 
faster than Inn. 



19 

D. Uniformly Random Partial Fourier Matrix 

Candes and Plan ll55l showed that a matrix A composed of randomly selected rows of a DFT matrix satisfies the following 
local restricted isometry property under a certain mild condition: — I s \\ < 6 with high probability for a fixed J C [n] 

of size s (J55] Lemma 2.1]) Q It is not difficult to derive the weak-1 RIP of such a matrix from its local RIP. We only need to 
consider the union of the events corresponding to all subset of [n] that include the support J and one more element outside J. 

Proposition 6.9 (Corollary to 05] Lemma 2.1]): Suppose that A is obtained by randomly selecting m rows of the n x n 
DFT matrix independently, each with probability — , followed by normalization of each column in £2 norm. Also suppose that 
J is a fixed subset of [n] of cardinality s. For any e,5 E (0, 1), if 

m >f^( h ffci)) +h()+1) }, !+1)> «,.„> 



then 



>(5™f(A; J) > S) < e. (6.12) 



where the probability is with respect to A. 

Proof: See Appendix [E] ■ 
Remark 6.10: The uniform RIP of a random partial Fourier matrix has been studied before |60l , 11611 , l62l . In particular, 
Rauhut 1 62 Theorem 8.4] showed a sufficient condition with explicit constants given by 

> C(T 2 ln 2 (100s)ln(4n)s, 



ln(10m) 

m > DS~ 2 ln(e _1 )s, 

where C < 17, 190 and D < 456, which is a much more demanding condition than ( 16.1 11 1. 

Remark 6.11: As discussed in the previous subsection, the algebraic analogue of the weak-1 RIP condition is usually much 
easier to satisfy. Feng and Bresler flSi showed that a matrix A composed of m consecutive rows of the n x n DFT matrix 
with m > s + 1 □ satisfies 

min <i a 1 k M;J)>0 (6.13) 

JC[n],|J|=« + 

Candes et al. lfl~4l showed that if n is a prime number, then the above result holds for any m rows. Note that m > s + 1 is a 
much milder requirement than (16.111) and the property is deterministic, i.e., holds always. However, the properties mentioned 
in this remark are purely algebraic and cannot be used in the analysis with noise. 

E. Incoherent Unit-Norm Tight Frame with Random Support 

Incoherent unit-norm tight frames l63l also satisfy the weak-1 RIP with high probability under a mild condition if the 
support J is uniformly random. Let A £ K mxn be a unit-norm tight frame. Then, each column of A has unit £2 norm and 

8 In fact, the result in 1551 and hence our argument derived from their result apply to a wider class of matrices. 
'The selection of the rows in this pattern was called the "bunched sampling pattern" in 1151 



the rows of A are orthogonal [63|. A unit-norm tight frame also satisfies ||A|| = The coherence fj, of A is defined by 

a \{ak,a>i)\ 
fj, = max T — — — — 

and is always bounded from below by the Welch bound l64l 



^ - V I 1V 
m{n — 1) 

For example, the rows of DFT matrix selected by the difference set method |f65l form a unit-norm tight frame that achieves 
the Welch bound. 

Based on the statistical RIP analysis in l66l . we derive the following result. 

Proposition 6.12: Suppose that A £ ]g mx ™ i s a fixed unit-form tight frame with coherence fi satisfying 

K 



< 



for some constant K > 0. Also suppose that J is uniformly random among all subsets of [n] with s elements. For any 

t,5e (0,1), if 

4, fp. ( „ / n, — .<? \ 1 

1 , (6.14) 



then 



m > V!| s + 288A - ln ^ 



\5:^{A- J)>8)< e. 



where the probability is with respect to J. 
Proof: See Appendix 10 

Remark 6.13: If A achieves the Welch bound, then the constant K in Proposition 16. 121 is no greater than 1. 
Remark 6.14: By slightly modifying the proof of Proposition 16.121 we obtain the uniform RIP of A 

W(S S (A) >5)<e 



if 



m > 



5 2 



{l + 576A' 2 In (^) } a + 288K 2 In (^j + 1 



(6.15) 



Compared to the weak-1 RIP, for the uniform RIP, the oversampling factor — has increased roughly by the factor 1 
576^ lnfsa). 



VII. Performance Guarantees 

A. MUSIC for the Full Row Rank Case 

With an imperfect estimate of the signal subspace, the support recovery by MUSIC is no longer guaranteed by an algebraic 
property of A. Instead, in the following proposition, we provide a new guarantee. 
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Theorem 7.1 (MUSIC, noisy, full row rank case): Assume that Xq G K" xAr is row s-sparse with support Jo C [n] and Xq° 
has full row rank. Let S be an s-dimensional estimate of S = TZ(Aj X^°) such that \\Pg — Pg\\ < r\ for some rj < 0.5. If A 
satisfies one direction of the weak-1 asymmetric RIP 

c£#(A;J )>a (7.1) 

for 

a > 2^(1 -^ll^lla^, (7.2) 

then MUSIC applied to Pg will identify J . 

Proof: See Appendix iGl ■ 

Remark 7.2: The columns of the sensing matrix A are often normalized in the l<z norm (e.g., for a partial Fourier matrix) or 
their £2 norms are highly concentrated around 1 (e.g., i.i.d. for Gaussian matrix). We therefore consider the quantity || A* || 2,00 
to be 1 or close to 1. In particular, we assume that all columns of A are normalized in the £2 norm ("A is normalized," in 
short) in all the figures and the numerical experiments in this paper. 

Remark 7.3: With normalized A, we have || A* || 2.00 = 1. and the weak-1 RIP S%^(A; Jq) < 1 — a 2 is a sufficient condition 
for (ED. 

Remark 7.4: When the signal subspace estimation is perfect (i.e., in the noiseless case), Conditions d7.lt and jl 2\ reduce to 
Qfjli (A; Jo) > 0, which is an algebraic condition implying rank[A,/ , Oj] = s+ 1 for all j € [n] \ Jq. This algebraic condition 
is implied by a much milder condition than the weak-1 asymmetric RIP, which is an analytic condition. For example, an i.i.d. 
Gaussian A with m > s + 1 satisfies this with probability 1 . 

Remark 7.5: Theorem 17.11 guarantees that MUSIC recovers a fixed support Jo. Replacing Condition ( 17.11 ) by its uniform 
analog, a s +i{Aj) > a for all J C [n] with | J| = s + 1, provides a uniform guarantee that MUSIC recovers an arbitrary 
support of size s. With perfect subspace estimation, the uniform guarantee reduces to Proposition 14.21 

B. SA-MUSIC with Given Partial Support 

SA-MUSIC finds the support by using the augmented subspace S constructed as S = TZ(Aj 1 ) + S, where Ji is a subset of 
Jo of size s — r and S is the estimated signal subspace of dimension r. We assume that there exists an r-dimensional subspace 
S of the signal subspace S = TZ(Aj Xq°) satisfying ||Ps — Pg\\ < rj. It will be shown in Section IVIIII that, if the number 
of snapshots is large enough relative to the noise variance, then Algorithm Q] computes S with the property that such an S 
exists. Recall (by Proposition |5.6t that assuming row-nondegenerate X J ° implies IZ^Aj^ + S = R(Aj ), which is desired by 
the MUSIC step in SA-MUSIC. However, because S is constructed using S rather S, which is not available, to show noise 
robustness of support recovery, we need to bound ||Pjr — P-r(Aj )\\- By tne projection update formula, it follows that 

Ps - Pn(A Jo ) = Pp^ Aii ~s ~ Pp^ Aji) S- 
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Therefore, we need to consider the distance between the two subspaces S and S as projected onto 1Z(Aj 1 ) ± . However, in 
general, projecting onto another subspace can either increase or decrease the distance between subspaces arbitrarily. In our 
specific case, the distance is bounded depending on the condition number of Aj . We state the result in a formal way in the 
following proposition. 

Proposition 7.6: Assume that Xq e K™ xAr is row s-sparse with support Jo C [n] and Xq° is row-nondegenerate. Let S 
be the estimated signal subspace of dimension r where r < s. Suppose that there exists an r-dimensional subspace S of the 
signal subspace S = 1Z(Aj Xq°) satisfying \\Pg — P§\\ < r/. Let J be a proper subset of Jo. If A satisfies 

°s(Aj ) 



then 



> V, (7-3) 



\ P P-.,S- P P^..,s\\ < TTT 1 ^^- (7-4) 



Proof: See Appendix IH1 ■ 
We are now ready to state one of the main results of this paper. 

Theorem 7.7 (SA-MUSIC with Correct Partial Support): Assume that X £ ~K nxN is row s-sparse with support J C [n] 
and Xq° is row-nondegenerate. Let S be the estimated signal subspace of dimension r, where r < s. Suppose that there exists 
an r-dimensional subspace S of the signal subspace S = 1Z(Aj Xq°) satisfying P§\\ < V- Let J\ be an arbitrary subset 

of Jo with s — r elements. If A satisfies the weak-1 asymmetric RIP 



for a and (3 satisfying 



a < a:^(A; J ) < fi#(A; J ) < P 



1 



'1 - 



> 



2r,f3 



l^llloo '- a-7,/3' 



(7.5) 



(7.6) 



then MUSIC applied to S = S + TZ(A Jl ) using the criterion 

. ||i>fe||2 



> max 



\P^k\ 



(7.7) 



k£j \Ji \\ak\\2 ke[n]\J \\ak\\2 

will identify Jo \ J\ . 

Proof: See Appendix U I 
Remark 7.8: With normalized A, Condition d73t+dJ6ll is implied by the weak-1 RIP of A given by 6™+i(A] J ) < S for 



< 



1-5 1 - VS 

T+s ' 3-Vs" 



(7.8) 



Remark 7.9: Compared to the guarantee on MUSIC (full row rank case, Theorem l7.lt . the guarantee for SA-MUSIC (for the 
rank defective case, Theorem l7.7b additionally requires Xq° to be row-nondegenerate. For the noiseless case, with normalized 
A, both Theorem l7.1l and Theorem 17.71 require only a mild algebraic condition a™+i(A; Jo) > 0. However, as shown in Fig. Q] 
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even with known partial support SA-MUSIC for the rank defective case suffers more from the perturbation in the subspace 
estimate than does MUSIC in the full row rank case. This difference in the sensitivity to the subspace estimation error is due 
to the reduced dimension r < s of the estimated signal subspace, which in turn is due to the rank defect of Xq°. 

Remark 7.10: Theorem 17.71 provides a performance guarantee for SA-MUSIC with a correct partial support estimate, but 
noisy subspace estimate. In this scenario, SA-MUSIC provides its best performance. How realistic is this assumption? In the 
next subsections, we will show that if the error in the subspace estimate is small enough, suboptimal greedy algorithms are 
indeed guaranteed to recover the partial support exactly. In particular, when r/s is large, SA-MUSIC combined with partial 
support recovery by greedy algorithms provides guarantees comparable to that given in Theorem 17.71 for SA-MUSIC with 
correct partial support estimate. 




0.1 0.2 0.3 0.4 0.5 

V 



Fig. 1. Comparison of MUSIC for full rank case vs. SA-MUSIC with known partial support in the rank defective case: trade-off between parameter 8 (for 
the weak-1 RIP) and r/ (for subspace estimate perturbation). The region below the curve provides a guarantee. 



C. SA-MUSIC with Partial Support Recovery by SS-OMP 

We first propose a new sufficient condition for a guarantee of M-OMP, which is of independent interest in its own right. 
Proposition 7.11 (M-OMP): Suppose that X <G K nxJV is row s-sparse with support J C [n]. Let Y = AX a + W. Given 
J C J , if A satisfies the weak-1 RIP 

5:^(A;J a )<5 (7.9) 

for 

6 < ii*o JoV7 ii^-y^ k 

2||* M II 

then the next step of M-OMP will identify an element of Jo. 

Proof: See Appendix [J] ■ 
Remark 7.12: Applying previous analysis [26 Theorem 5] of p-SOMP to the case p = 2 provides an alternative guarantee 
for M-OMP. However, the conditions for the latter are stated in terms of the p-Babel function and its variations rather than 
the weak-1 RIP as in our Proposition 17.1 II The two guarantees are not directly comparable, but neither one is uniformly less 
or more demanding than the other in terms of the conditions required. Because the form of the condition in Proposition 17.1 II 
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is amenable to our analysis, it is preferred in this paper. 

Remark 7.13: Unlike MUSIC, M-OMP is guaranteed only if the orthogonality between two vectors is nearly preserved 
through A, Therefore, unlike MUSIC, M-OMP is not guaranteed by one direction of the weak-1 asymmetric RIP. 

Remark 7.14: For the noiseless case (W = 0), since 

ll*n MJ |koo . 1.1 



||x 7oV7 ll - ypoWT " V^' 

it follows that Condition (|7.91 >+ (I7. 10b is implied by 6J^(A; Jo) < which is a weaker requirement than the previously 
known sufficient condition S s +i(A) < J39). 

Next, the guarantee of SS-OMP is obtained as a corollary to Proposition 17.111 Its main utility is as a stepping stone to the 
guarantee for SA-MUSIC with partial support recovery by SS-OMP. 

Corollary 7.15 (SS-OMP): Assume that Xq G K nxN is row s-sparse with support Jo C [n] and let S be the estimated signal 
subspace of dimension r < s. Suppose that there exists an r-dimensional subspace S of the signal subspace S = TZ(Aj Xq°) 
satisfying \\P§ - P s \\ < rj. Let $ G K sxr satisfy $*$ = I r , and S = ll(Aj §). Let pk{$) denote the k-th largest l 2 norm 
of the rows of If A satisfies the weak-1 RIP 

5™f(A;J )<8 (7.11) 



for 6 satisfying 

p fc ($) 26 



> 27,11 A* Ha,,,,, (7.12) 



then the first k steps of SS-OMP will identify k elements of Jq. 

Proof: See Appendix iKl ■ 

The conditions for S in terms of S and <E> appear rather technical, but we will show that they are satisfied by our proposed 
subspace estimation scheme. Furthermore, their implications are interpreted in the following remarks. 

Remark 7.16: Since Pk(&) corresponds to the l<x norm of a row of Pk(&) does not change by applying a rotation to the 
right of $. In other words, for fixed S, Pk($) is invariant to the choice of $. Now, since Pk(&) is monotonically decreasing in 
k, the sufficient condition d7.lU - d7.12b of Corollary 17. 151 gets more stringent (requires a smaller weak-1 RIC) as k increases 
toward s. This implies that the smaller the size of the partial support that is to be recovered by SS-OMP, the less stringent the 
condition for the guarantee on the recovery. Moreover, for k > s — r, the guarantee of Corollary 17. 15l requires a more stringent 
condition than the guarantee on the remaining step of SA-MUSIC in Theorem 17. 71 where a partial support of at least size s — r 
is assumed to be known. For these two reasons, switching to SA-MUSIC applied to the augmented subspace after successful 
partial support recovery of size exactly k = s — r by SS-OMP is preferred rather than continuing any of the remaining steps 
of SS-OMP. 

Clearly, the smaller p s _ r (<I>), the more stringent the condition on 5™^(A; Jo). We therefore provide a deterministic lower 
bound on p s _ r ($). The bound is based on the Cauchy-Binet formula, and holds for r/s > 0.5. To facilitate the interpretation 
of the condition, the bound of Lemma 17.171 is visualized in Fig. [3] 

Lemma 7.17: For r, s G N where s/2 < r < s, let $ G K sxr satisfy <&*<!> = I r , and let pk(&) denote the k-th largest £ 2 
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norm of the rows of Then, 



where p(s, r, q) is defined by 



Ps-r(&) > P(s,r) = sup p(s,r,q) 
~ q>a 



(7.13) 



( ( «! 



P(s,r,q) = 



\ -9/2r 

7'!(,s — r)\ J 



1/9 



(7.14) 



for q > 0. 

Proof: See Appendix O 




Fig. 2. Lower bound p(s,r,q) on p s _ r (C7) by Lemma f7 . 1 7 1 with g = 10 



Combining Lemma 17.171 and Corollary 17 . 1 5 1 (for the success of the first s — r steps of SS-OMP) with Theorem l7.7l we obtain 
another of our main results: the following theorem provides a deterministic performance guarantee of SA-MUSIC combined 
with SS-OMP. The proof is straightforward and therefore omitted. 

Theorem 7.18 (SA-MUSIC+SS-OMP, rank-defective case): Assume that Xq £ ~K nxN is row s-sparse with support Jq c [n] 
and X Jo is row-nondegenerate. Let S be the estimated signal subspace of dimension r, where s/2 < r < s. Suppose that 
there exists an r-dimensional subspace S of the signal subspace S = 1Z(Aj Xq°) satisfying \\Pg — P§\\ < rj. If A satisfies 
the weak-1 RIP 

S:^(A;J )<5 (7.15) 



for S satisfying 



and 



1 - 5 



> 



p(s,r) 



I 2,00 
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^>2 ?7 ||A*|| 2 , ( 



where p{s,r) is defined in Lemma I77T71 then SA-MUSIC+SS-OMP applied to S will identify J . 



(7.16) 



(7.17) 



As discussed in Remark 15.41 the row-nondegeneracy condition on Xq° is a mild condition essential to SA-MUSIC and 
likewise Condition (17.15b + ( 17.16b is inherited from SA-MUSIC (see Theorem [7771 and Remark l731l. Condition ( 17.151 ) + ( 17.17b , 
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on the other hand, is due to the use of SS-OMP, a suboptimal algorithm, for the partial support recovery (see Corollary 17.1 5 l l. 
For the noiseless case (77 = 0), with normalized A, (I7.16l l reduces to S < 1, which is a necessary condition for (17.17b . Therefore, 
it suffices to satisfy ( 17.15I )+( |7.17I ). where ( 17.17b reduces to 

p(s,r) 26 

- - -== > 0. (7.18) 



■s = 8 
■s = 128 

■ s = 2048 
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Fig. 3. Required weak-1 RIC for the guarantee for SA-MUSIC+SS-OMR rank-defective case in Theorem 17.1 81 (n = 0). The dot at the top right of the plot 
represents the weak-1 RIP a™5fi(A, Jq) > required in the full rank case, r = s. 



As shown in Fig. [3] the weak-1 RIC 8 required by (17.18l l increases with increasing r/s. In other words, SA-MUSIC+SS- 
OMP benefits from higher dimension of the signal subspace S. In particular, when r = s, SA-MUSIC reduces to MUSIC for 
the full rank case and hence it suffices to satisfy 5J^(A; Jq) < 5 = 1, which is plotted as the dot at the top right in Fig. [3] 

Fig. 2] visualizes the trade-off between S and 77 given by Conditions ( 17.161 ) and ( 17.161 ) for the noisy case. Given r/s, any pair 
(»7,5) below the curve determined by r/s guarantees SA-MUSIC+SS-OMP. As r/s increases, the curve shifts upward and the 
guarantee is given by larger 6 and/or larger 77. In particular, when r/s = 1, SA-MUSIC reduces to MUSIC without the step 
of partial support recovery and hence, by Theorem 17.11 the trade off is given by 

1 - Vs 



which is also plotted in Fig. |4] 



D. SA-MUSIC with Partial Support Recovery by SS-OMSP 

Proposition 7.19 (SS-OMSP, rank-defective case): Assume that Xq 6 K nxN is row s-sparse with support Jo C [n] and 
Xq° is row-nondegenerate. Let S be the estimated signal subspace of dimension r, where r < s. Suppose that there exists an 
7--dimensional subspace S of the signal subspace S = TZ(Aj Xq°) satisfying \\P§ — P§\\ < V- Given J C J , if A satisfies 
the weak-1 asymmetric RIP 

a < a^(A; J ) < ^(A; J ) < /3 (7.19) 
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Fig. 4. Trad e-off between parameters <5 (for the weak-1 RIP) and r\ (for subspace estimate perturbation) for the guarantee of SA-MUSIC+SS-OMP in 
Theorem 17. 181 Values (rj, <5) in region below the curve provide a guarantee. 



for a and /? satisfying 



/ dim(P^ (A/) S) g r~ a 2 277/3 

s-\J\ 'Plkoo V -a- V P 



1 - n > ^> ( 7 - 2 °) 



then the next step of SS-OMSP will identify an elements of Jo \ J. For the special case when r = s, Condition ( 17.20b is 

replaced by the weaker condition 

a 2 2nB 



Proof: See Appendix [M] ■ 
In the full row rank case, Xq° is trivially row-nondegenerate. In the noiseless case, we have rj = and with normalized 
A, Condition ( 17.211 ) reduces to a^^(A; Jo) > 0. Therefore, for the full rank and noiseless case, SS-OMSP is guaranteed by 
a™+i(A; Jo) > 0, which coincides with the condition for the guarantee of MUSIC in the same scenario (Proposition 14. 2\ . In 
fact, in this noiseless and full row rank case, SS-OMSP is equivalent to the corresponding data domain algorithm, RA-ORMP. 
The coincidence of the guarantees of MUSIC and of RA-ORMP in this special case has been shown before l48l . (Unlike the 
analysis of RA-ORMP fl48l though, Proposition 17 . 1 91 also applies to the noisy and/or rank-defective cases.) 

If is row-nondegenerate, then by Proposition [5761 dim(lZ(Aj) + S) = s for any J C Jo with | J| = s — r, which implies 
dim(P^ Aj ^S i ) = r. Then, we also have &\m(P^ A ^S) = r for any | J| < s — r. Combining this result with Proposition 15.61 
Proposition 17.191 and Theorem 17.71 we obtain another main result of this paper: a guarantee for SA-MUSIC+SS-OMSP. 

Theorem 7.20 (SA-MUSIC+SS-OMSP, rank-defective case): Assume that Xq g K nxN is row s-sparse with support Jo C [n] 
and X^° is row-nondegenerate. Let S be the estimated signal subspace of dimension r, where r < s. Suppose that there exists 
an r-dimensional subspace S of the signal subspace S = 1Z(Aj Xq°) satisfying \\P$ — P§\\ < rj. If A satisfies the weak-1 
asymmetric RIP 

a < <;\ k (A; Jo) < ^{A; J ) < (3 (7.22) 
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for a and f3 satisfying 



and 



'1 - 



> 



\A* 



-Jl- 



^*ll2.oo 



> 



a — r)/3 ' 



(7.23) 



(7.24) 



then SA-MUSIC+SS-OMSP applied to S will identify J . 

Remark 7.21: With normalized A, Condition (17.23b is implied by Condition ( 17.241 ), which means that partial support recovery 
by SS-OMSP requires more stringent conditions than the subsequent MUSIC step in the guarantee of SA-MUSIC+SS-OMSP. 
This results in the same guarantee for SA-MUSIC+SS-OMSP as for SS-OMSP alone. However, in the numerical experiments 
in Section [DO the two algorithms exhibited substantially different performance. To interpret this, we compare SA-MUSIC and 
SS-OMSP conditioned on the event that a correct partial support of size s — r has been found. By the row-nondegeneracy 
condition on Xq°, it follows that d\m(P^ A ^S) = s — \J\ for all J C Jo with \ J\ > s — r. Therefore, with known partial 
support, the remaining steps of SS-OMSP are guaranteed by 



\A* 



4*112 
K 1 ll2,oo 



> 



a ~ rj/3 ' 



(7.25) 



which is obtained from Condition ( 17.20b with dim(Pj^, A sS) = s— | J|. In contrast, as shown in Fig. [5] when a partial support 
of size s — r is given, the condition in ( 17.8b for the guaranteed success of SA-MUSIC is substantially milder. 
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Fig. 5. Comparison of SA-MUSIC and SS-OMSP when a partial support of size s — r is given: trade-off between parameter 8 (for the weak-1 RIP) and r\ 
(for subspace estimate perturbation). The region below the curve provides a guarantee. 

Remark 7.22: With normalized A, Condition ( |7T22l +( [7T23l +( [724t is implied by the weak-1 RIP of A given by 5%*£(A; J ) < 
5 for S satisfying 



ll 


- s 




+ 6 



'■/sVT~S-VS 



Furthermore, if we assume rj = 0, then the guarantee for SA-MUSIC+SS-OMSP only requires 



S:^(A; Jo) < 



r + s 
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which as shown in Fig. [6] becomes less demanding as r/s increases. 




Fig. 6. Required <5 (for the weak-1 RIP) for the guarantee of SA-MUSIC+SS-OMSP in Theorem 17.201 for the noiseless case (rj = 0). The dot at the top 
right of the plot represents the weak-1 RIP a™5f^(A, Jo) > required in the full rank case, r = s. 



For the noisy case, Conditions ( 17.231 ) and ( 17.241 provide a trade-off between the parameters 6 and rj for the guarantee of 
SA-MUSIC+SS-OMSP, which is visualized in Fig. [7] As r/s increases, SA-MUSIC+SS-OMSP benefits from higher dimension 
of the signal subspace S. Compared to the guarantee of SA-MUSIC+SS-OMP in Fig. g] SA-MUSIC+SS-OMSP is guaranteed 
by a weaker requirement on A in terms of the weak-1 RIP. 
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Fig. 7. Trade-off between parameters <5 (for the weak-1 RIP) and rj (for subspace estimate perturbation) for the guarantee of SA-MUSIC+SS-OMSP in 
Theorem 17.201 Values (rj, <5) in region below the curve provide a guarantee. 



E. Implication of the Guarantees in Weak-1 RIP to the Oversampling Factor 

The results in the previous subsections were stated in terms of the weak-1 RIP. Given an upper bound on S^^[{A\ J ) (or 
bounds on a™+\(A] Jo) and f3™^i(A; Jo)), Section [VI] then provides explicit conditions on the parameters n, m, s that provide 
the weak-1 RIP for the matrices A discussed there. 

Example 1 (i.i.d. Gaussian A, asymptotic case) In the first example, we consider asymptotic analysis with an i.i.d. Gaussian 
A. By Proposition 16.51 if n and s go to infinity while satisfying n = o(e s ), i.e., s grows faster than Inn, and for 7 <G (0, 1) 
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then 

1 - 7 < aT^(A; J ) < p™f(A; J ) < 1 + 7 

with probability 1. Furthermore, in this asymptotic, || || 2,00 = 1 with probability 1. Assume that the estimated signal subspace 
S is error-free (77 = 0). Then, for the full row rank case, i.e., dim(S') = s, all SA-MUSIC algorithms reduce to MUSIC without 
partial support recovery and are guaranteed by m > s, which is also a necessary condition for the support recovery. On the 
other hand, if dim(S) = r < s, then SA-MUSIC+SS-OMSP is guaranteed by 

m > (l + s, 

which unfortunately does not converges to m > s as r/s — >• 1. The discontinuity at r/s = 1 is the limitation of the current 
analysis in this paper. However, it provides a valid upper bound on the oversampling factor m/s in the given asymptotic. 
Example 2 (structured A, non-asymptotic case) In the second example, we perform the analysis with matrices that arise 
in practical applications (e.g., spectrum blind sampling or DOA estimation). We use the results in Section [VI] for incoherent 
unit-norm tight frames and partial Fourier matrices. Fortunately, these matrices have normalized columns and we do not need to 
worry about ||^4*||2,oo any longer. Given 5, the weak-1 RIP, 5%+i(A; Jo) < S (or the weak-1 asymmetric RIP, a^^(A; J ) < a) 
holds with probability 1 — e if 

to>Ci(s + C 2 ). (7.26) 

For an incoherent unit-norm tight frame, C\ is a constant that depends only on 5. However, for the random partial Fourier 
case, Ci also depends on ln(s + 1), ln(n — s), and e. We summarize the explicit formulae for C\ and C2' 



• Random partial Fourier (Proposition 16.91 ) 

, /2(n-s)\ , , 
In f V M +ln(s + 1) , 

C 2 = 1. 

• Incoherent unit-norm tight frame with random support (Proposition 16.12b 

C 2 = 288X 2 ln(fc^)+l 

where K is determined by the coherence of A {e.g., K = 1 if A achieves the Welch bound). 

Substituting 5 (for the weak-1 RIP) into these expressions determines the explicit scaling of m versus s and the other 
problem parameters that will provide guaranteed recovery. As r/s increases, 8 increases and hence the oversampling factor C\ 
in ( 17.261 ) decreases. In particular, when r = s, SA-MUSIC reduces to MUSIC without need of any partial support recovery 
and hence is guaranteed by the weak-1 asymmetric RIP a™^(A; J ) > 0, which corresponds to C\ = 1 and C2 = 1 in ( 17.261 ). 



Ci 



2(3 + 5) 
3<5 2 
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In the presence of perturbation (77 > 0), Figs. [4] and [7] provide the required S for the guarantees and the oversampling factor 
is computed from 5 similarly. The relation between 77 and the number of snapshots N will be investigated in the next section. 

VIII. Analysis of Signal Subspace Estimation 

Unlike the previous works in sensor array processing BP which relies on asymptotics, we analyze the perturbation in the 
estimate of the signal subspace with finitely many observations. Combined with the results in Section lVTIl this analysis provides 
non-asymptotic guarantees in the noisy case for the new proposed algorithms directly in terms of the measurement noise. The 
results also enable us to extend the previous performance guarantees J4), fl31 . Ifl6l of MUSIC to the noisy and finite snapshot 
case, which was missing before. 

Assumption 1 (Noise) Given row s-sparse Xq G K nxN with support Jo, the snapshot matrix Y E K mxN is obtained with 
sensing matrix A E K mx ™ as 

Y = A Jo X£° + W 

where the columns of W are independent realizations of Gaussian vector w € K m with Ew = and Eww* = <T^I m - For the 
complex field case, we assume circular Gaussian distribution. We assume that Aj has full column rank. 
Assumption 2 (Number of snapshots) We assume that the number of snapshots TV is large but finite, more specifically, N 
satisfies N > m. It is also assumed that m > s which is required for support recovery by any method. 

Our assumption on the number of snapshots is motivated by the following considerations. In compressed sensing, the goal is 
usually to minimize the number of expensive measurements. Now, in certain applications of joint sparse recovery, taking many 
snapshots is a rather trivial task compared to acquiring many measurements in a single snapshot. For example, in spectrum-blind 
sampling (4), the number m of measurements per snapshot m determines the sampling rate, the increase of which is usually 
expensive and limited by hardware. In contrast, taking many snapshots only results in delay in the support recovery and is 
usually less expensive than raising the sampling rate. Similarly, in DOA estimation J5] and in distributed sensor networks |67l , 
increasing m requires more sensors, which is expensive, whereas increasing the number of snapshots N corresponds to delay 
in estimation, which is relatively less expensive. This motivates the setting N > m in the analysis of this subsection. 
Assumption 3 (Signal) We assume that the nonzero rows of Xq follow the mixed multichannel model given by 

A 7 ° = *A$ 

where £ ~K sxM with M < s is a mixing matrix that has full column rank, A S jjMxm j s a deterministic, positive, and 
diagonal matrix, and the elements of $ E K MxN are independent zero mean and unit variance Gaussian random variables. 
Note that rank(A 7 °) = rank(<p) = M with probability 1. We assume that $ is independent of W. The rows of A<E> correspond 
to realizations of M statistically independent sources, where the diagonal entries of A represent the magnitudes of the sources. 
The columns of Xq° in this model are independent realizations of Gaussian vector x € K s with Ex = and Exx* = 'I'A 2 ^'*. 

The mixed multichannel model generalizes the multichannel model ||26| proposed for the average case analysis of various 
methods for joint sparse recovery. With & = I s , the mixed multichannel model reduces to the multichannel model. However, 
with a rectangular mixing matrix ^ 6 ]K sxM for M < s, the mixed multichannel model can describe the "rank defect", which 
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is due to the correlation between the mixed sources, i.e., between the rows of Xq°. Such correlation, which often arises in 
the above mentioned applications, cannot be represented by the multichannel model, in which Xq° has full row rank with 
probability one for N > s. 

With the mixed multichannel model, the N columns of Aj Xq° £ K. mxN are independently distributed Gaussian vectors 
with zero mean and covariance matrix 

A Ja ^KH*A* Ja . 

Assumption 4 (Covariance matrix) We assume that there exists a significant gap between at least one pair of consecutive 
eigenvalues of T, more specifically, the covariance matrix T satisfies the following conditions given by the parameters reN 

and t, v,6 £ (0, 1): 

(l-0)A r (r)-(l+0)A r+1 (r) 

> (1 + 0)(1 + i/)tAi(T) (8.1) 
(l + 6)X k (T)-(l-e)X k+1 (T) 

< (1 -0)(1 -z/)tAi(T), yk>r. (8.2) 

Condition ( 18.11 ) asserts that there exists a significant gap between two consecutive eigenvalues A r (T) and A r +i(T). Condition 
d8.21 i asserts that there does not exist a significant gap between any two consecutive eigenvalues smaller than A r (r). Together, 
the two conditions imply that r is the maximal value that satisfies ( 18.1b (a gap can not be both big enough and small enough 
at the same time). 

When r is well conditioned, such that its condition number k(T) satisfies 

K 2 (T) A Ml < 1 -° 

[ ' X M (T) - (1 + 0)(1 + j/)t 

then r that satisfies ( I8.1l )-( l8^2l ) will assume its maximal value of rank(r) = A/. In this case A r +i(r) = 0, and ( 18.21 is trivially 
satisfied. Otherwise, we consider that T is ill-conditioned with one or more insignificant eigenvalues and set r to the index 
of the smallest eigenvalue larger than those considered insignificant. In this case, ( 18. 2t implies that A r+1 (T) is bounded from 
above by 

M- 

A, 



/I -0V 

v r+1 (r)<(i- I ,)rA 1 (r)^ — - . 



Proposition 8.1: Suppose that Assumptions A1-A4 hold and define 



Let S be the subspace spanned by the r dominant eigenvectors of T$ defined by 



a_ A Jo X^(X^rA} 



N 
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Let e, j] E (0, 1). If the number of snapshots N satisfies 



N > 2(m + s) 
'36 



N > 



N > 



s + In 



144 



+ s + In 



then with probability 1 — e, Algorithm Q] with parameter r computes an ?-dimensional subspace S such that 



iPf-Psll^r?. 



(8.4) 
(8.5) 



(8.6) 



(8.7) 



Proof: See Appendix INI 

When the noise variance crj, is small compared to Ai(r), Condition (18.6b dominates the first two and is simplified as 

>2/Ai(r)) 1 / 2 [m + .s + ln(8/e)r 



N = O 



ij- 



so that the number of snapshots required for the guarantee scales linearly in m. Alternatively, in the same scenario, Proposi- 
tion implies that (IQ i holds for 

1/4 



m + s + ln(8/e) 



^ M?)J v n 

Define the average per-sample SNR as the ratio of the powers of the measured signal and noise, 



n\A Jo x j °\\i _T.k=i^) 



M 



SNR 



n\w\\ 2 F 



Then, Ai(r)/cr 2 is related to the SNR by 



Ai(T) / Ax(r) 



-tr(r) 



• SNR. 



For fixed T and SNR, r\ scales proportionally to N~ x / 2 . With more snapshots, SA-MUSIC algorithms access an estimate of 
signal subspace with higher accuracy (smaller rj) and hence, as shown in Figs. [4] and [7] the admissible 5 increases, which 
results in decrease of the required oversampling factor m/s. Eventually, as N goes to infinity, the performance converges to 
that in the noiseless case. 



IX. Numerical Experiments 

We compared the performance of the two proposed SA-MUSIC algorithms: SA-MUSIC + SS-OMP and SS-MUSIC + SS- 
OMSP versus MUSIC 0, M-BPi 10 L and the two subspace greedy algorithms proposed in this paper for partial support recovery: 
SS-OMP and SS-OMSP. As an upper bound on the performance of SA-MUSIC, we included in the comparison SA-MUSIC 

10 The noise variance is given to the M-BP algorithm in the experiment. 
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with known ("oracle") partial support. The sensing matrix A was generated as randomly selected m rows of the n x n DFT 
matrix. The snapshot matrix Y = AX^ + Z was corrupted by additive i.i.d. circular complex Gaussian noise Z. 

The algorithms were tested on random Xq° of rank less than s. Singular vectors Uq and Vq of Xq° = UqT,qVq were 
generated as random orthonormal columns. In order to observe the effect of the rank-defect rather than ill-conditioning, the 
singular values of Xq° are set to a common value in the first experiment. The performance is assessed by the rate of successful 
support recovery As shown in Fig. [8] MUSIC fails when rank(XQ /o ) < s. SS-OMP is little affected by the rank-defect, but 
its performance does not improve much with increasing r. 

The performance of SA-MUSIC algorithms with greedy partial support recovery improves with increasing r. SA-MUSIC + 
SS-OMP and SA-MUSIC + SS-OMSP, labeled as "SA-MUSIC" and "SA-MUSIC+", respectively, in Fig. [5] performed better 
than M-BP in this experiment at much lower computational cost. SA-MUSIC with known partial support of size s — r is labeled 
as "SA-MUSIC with oracle" in Fig. [8] and shows perfect recovery when m > s + 1, which is nearly the necessary condition 
m > s. This suggests that the success of partial support recovery is more critical than the subsequent steps and leaves room 
for improving SA-MUSIC by combining with a better algorithm for partial support recovery than SS-OMP or SS-OMSP. 

For the noiseless case, the performance of SA-MUSIC+SS-OMSP and SS-OMSP coincides. However, with noise, the 
performance SS-OMSP severely degrades even for the full row rank case. For the full row rank case, all algorithms except 
SS-OMP and SS-OMSP (noisy case) were successful in terms of nearly achieving the necessary condition m > s. Again, 
SS-OMSP is sensitive to the perturbation in the estimate of signal subspace in this case. 

Regarding the computation, we compared the runtime of each algorithm by increasing the size of the problem. In this 
experiment, fixing n = (scale factor) x 64, we set the other parameters to s = n/16, r = \7s/8], and m = 2s. As shown in 
Fig. HI SA-MUSIC is about a 100 times faster than M-BPQ;. 

In order to see the effect of ill-conditioning, in the second experiment, we tested the algorithms on a random matrix 
row rank that has geometrically decaying singular values. The fc-th largest singular value of Xq° is 
set as ak(XQ°) = ft _ ( fe_1 )/( s_1 ) for fe = 1, . . . , s so that the condition number of Xq° becomes n. The singular vectors were 
generated randomly as in the first experiment. Fig. [10] compares the performance of the algorithms for the weak noise case. 
We note that M-BP is sensitive to the ill-conditioning of Xq° . When Xq° is well conditioned with k = 10, the dimension of 
the estimated signal subspace is equal to the row rank of X^ a (= s) and hence SA-MUSIC coincides with MUSIC without 
any SS-OMP step. However, when k = 50, the estimated rank r by using (14. 4t in Algorithm Q] with r = 10 3 is smaller 
than rank(X °) and hence MUSIC suffers from the rank-defect while SA-MUSIC provides consistent performance. The 
performance of M-OMP is invariant to the rank-defect but is poor compared to that of SA-MUSIC. 

1 1 M-BP does not produce an s-sparse solution in the presence of noise. In this case, the solution by M-BP has been approximated to the nearest s-sparse 
vector and the support is computed as that of the s-sparse approximation. 

12 For M-BP, we used an efficient implementation SPGL1 1681 . 1691 . On the other hand, the other methods were implemented as plain Matlab script. 
Therefore, the speed comparison does not unfairly favor SA-MUSIC. 
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Fig. 8. Test on rank-defect, n = 128, s = 8, N = 256, Left columns (noiseless): (a) rank(Xg ) = 4. (c) rank(Xg ) = 6. (e) rank(X^ ) = 8 (full row 
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X. Discussion 

A. Comparison to Compressive MUSIC 

An algorithm similar to SA-MUSIC named "Compressive MUSIC" (CS-MUSIC) has been independently proposed by Kim 
et al. (44). Although the main ideas in SA-MUSIC and compressive MUSIC are similar, in fact, the two papers differ in the 
following ways. First, the algorithms considered are different, in particular in the step of partial support recovery. Second, the 
analyses in the two papers are fundamentally different. The analysis of Kim et al. (441 depends heavily on the assumption 
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Fig. 10. Test on large condition number, n = 128, s = 8, rank(XQ°) = s (full row rank), N = 256. (a) k = 10. SNR = 30 dB. (b) re = 50, SNR = 30 
dB. 



that A is an i.i.d. Gaussian matrix and the size of the problem goes to infinity satisfying certain scaling laws. The authors 
showed that, under certain conditions, the probability of failure in the support recovery converges to in their "large system 
model". However, since no convergence rate is shown, the analysis provides no guarantee on any finite dimensional problem. 
In contrast, the guarantees in this paper are non-asymptotic and based on the weak-1 RIP. Our guarantees provide explicit 
formulae for the required m as functions of s and n, for various sensing matrices A including i.i.d. Gaussian, random partial 
Fourier, and incoherent unit-norm tight frame, whereas the analysis in P4l only applies to an i.i.d. Gaussian A. 



B. Comparison to the Guarantee of M-BP with the Multichannel Model 

Various practical algorithms including p-SOMP, p-thresholding, and M-BP, have been analyzed under the multichannel model 
(26), ll42l . Although it is restricted to the noiseless case, the average case guarantees of M-BP with the multichannel model 
ll42l has been shown to be better than the other guarantees of the same kind for other algorithms. Therefore, we compare the 
guarantees of SA-MUSIC algorithms to that of M-BP. 

For this comparison, we too assume that the snapshots are noise-free, i.e., Y = Aj a X^°. Nevertheless, the guarantee of 
SA-MUSIC algorithms in this paper is restricted neither to the noiseless case nor to the multichannel model. 

In the noiseless case, the signal subspace estimation is perfect, S = S = 1Z(Aj Xq°), with r = dim(S') = rank(XQ /o ). If 
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N > s where s is the sparsity level, then following the multichannel model has full row rank with probability 1. In the 
full row rank case, any SA-MUSIC algorithm reduces to MUSIC and provides the best possible guarantee with the minimal 
requirement aj| * (A; Jo) > 0, which reduces to m > s for certain matrices such as i.i.d. Gaussian A. This completes the 
comparison in the case N > s. Therefore, to compare the performance of SA-MUSIC and M-BP in the rank-defective case, 
we assume that N < s. The rank of Xq° is then determined by the number of snapshots, i.e., rank(X 7 °) = N and hence 
r = N. 

Previous work [42 Theorem 4.4] showed that M-BP is guaranteed with probability 1 — e if A satisfies the weak-1 RIP 

5:^(A;J a )<5 (10.1) 

for 5 satisfying 

2 In > T7 L ^ + 1 - ( 10 - 2 ) 



1-6 J V 1-6 J ~ N 
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Fig. 11, Required weak-1 RIC for the guarantees of M-BP (the average case analysis with the multichannel model with error probability t = 10 3 ), and 
SA-MUSIC (worst case analysis) for the noiseless case (a) n = 128, s = 8. (b) n = 1024, s = 64. 

SA-MUSIC+SS-OMP and SA-MUSIC+SS-OMSP are guaranteed by Theorems 177X81 and [77201 respectively. In particular, 
when Xq° follows the multichannel model, it is row-nondegenerate with probability 1 . Therefore, we need only compare the 
weak-1 RIP conditions in Theorems 17 . 1 8 1 and 17 . 201 to the weak-1 RIP given by dlO. Ib +( fl0.2t . Fig. QTJdisplays this comparison. 

For all three algorithms, as r increases, 6 required for the guarantee increases and hence the guarantee is obtained subject to 
a milder condition. FigQT|(a) shows that SA-MUSIC+SS-OMSP requires larger RIC and hence requires reduced oversampling 
factor m/s compared to M-BP when the size of the problem is small (n = 128). FiftfTTlfb) shows that SA-MUSIC+SS-OMSP 
provides a better guarantee (larger RIC) than M-BP in the regime r/s > 0.6 when n = 1024. 

The theoretical guarantee not withstanding, in our simulations, the recovery rate of the SA-MUSIC algorithms was always 
higher than that of M-BP and often substantially so. 

C. Comparison to the Analysis of Group LASSO in High Dimension 

The guarantee of Group LASSO by Obozinski et al. Il30l is quite tight and, in particular, achieves the optimal guarantee by 
the minimal requirement (m > s) for certain scenarios. However, their guarantee is asymptotic and only applies to Gaussian 
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A. In contrast, although our guarantee of SA-MUSIC+SS-OMP is not as tight as that of Group LASSO [[30], the guarantee is 
non-asymptotic, i.e., valid for any finite problems, and applies to wider class of matrices that arise in practical applications, 
including the partial Fourier case. 

D. Comparison to Compressed Sensing with Block Sparsity 

The joint sparse recovery problem can be cast as a special case of compressed sensing with block sparsity |fP7| . The block 
structure in the sparsity pattern in the latter problem has been exploited to improve the performance of the sparse recovery 
(cf. J67], ifTTll , (70), ll7T1l ). For example, the analysis 17T1 showed improvement of the structured sparse recovery over the 
unstructured original problem. However, the special case where the sensing matrix is a block diagonal with repeated blocks, 
which corresponds to the joint sparse recovery problem, is not covered by the proposed theory 1711 . 
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We list a few lemmata that will be used for the proofs in the appendix. 

Lemma A.l (fi72\ Corollary III.1.5]): Let Ay e K mxni and A 2 E K mxn \ Let A <E K mxn be the concatenation of A x and 
A2, i.e., A — [Ai, A2] where n = n\ + n.2- Then, there is an interlacing relation between the singular values of A and A\ 
given by 



Appendix 



A. Lemmata 



<y k {A) > <7fc(Ai) > <j k+n2 (A) 



(A.1) 



for k = 1, . . . , m. 



Lemma A.2: Let A € K' 



and let J Jo C [n]. Then, it follows that, for k = 1, . . . , | Jo \ J\, 



^k(A*j oUj Aj oU j) 



- ^k(Aj \jP n ( Aj ) 



A Jo\j) 



(A.2) 



> ^k+\j\(A*j oUJ Aj oUJ ), 



which is equivalent to 



CTfc(^J uj) > Vk{P n (Aj) A Ja\j) > < J k+\J\{A*j aVJ jAj aU j). 



(A.3) 
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Proof: Note that A* Jq ^jP^ a sAj \j is the Schur complement of the block AjAj of the matrix 



a * Jo \j a jo\j a \\Aj 

a *Aj \j A*jAj 



n*A* oUJ A JoUJ n 



where IT is a permutation matrix that satisfies 

A JoUj U = [A Jo \j, Aj}. 

Application of the interlacing relation of the eigenvalues of the Schur complement ll73l completes the proof. ■ 
Lemma A. 3: Let A £ K mx " and B £ K nxp where m> n and p> n. Then, 

||AB|| > a n - k+1 (A) ■ a k (B) 

for k = 1, . . . , n. 

Proof: Let A = E/iEiFj* and A = UiY> 2 VZ denote the extended SVD of A and B, respectively, where Si, V\, U 2 , E 2 £ 
K" xn . Let k £ [n]. Then, 

||ab|| = ||[/ 1 s 1 y 1 *c/ 2 s 2 y 2 *|| 

= HEiVTtfaEall 

> A„_ fc+ i(ViEiV7) • A fe (C/ 2 E 2 C/ 2 *) 

= (T n _ fe+ i(A) • cr fe (S) 

where the inequality follows from the Gel'fand-Naimark theorem |[72l Theorem III. 4. 5]. ■ 



B. Proof of Proposition 15.61 

We use the following lemma to prove Proposition [ 
Lemma A.4: Suppose that $ £ K sxr where r < s satisfies 

krank($*) = r 

and that ^ £ K sxfe for k < r spans a fc-dimensional subspace of 7Z(Q). Then, 

krank(^*) = k. 

Proof: There exists R £ K rxk such that R has full rank and = Let K be a set of fc indices from [s]. Since 
krank($*) = r implies rank(($*)i<-) = k, rank((\f , *)ir) = rank(ii* ($*)#-) = rank(($*)A') = fc. Since K was arbitrary, we 
have krank(\&*) = k. ■ 
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Proof of Proposition \5.6\ By the projection update formula, we have 

Note that TZ(Aj ± ) and P^ Aj )S are orthogonal to each other and both are subspaces of TZ(Aj ). Furthermore, by assumption, 
rank( A j ) = s and hence rank(A j 1 ) = | Ji | = s — r. Therefore, it suffices to show that 

Aim{Pi {Aji) S)=r. (A.4) 

Let U e K nxr satisfy S = K(A Jo U J °) and U [n] \ J " = 0. Then, (EOt is equivalent to 

rank(P^ (Aji) A Jo X /o ) = r, 

which holds since 

*r(P^ Aji) Aj U J °) 
= <?r(P^ Aji) A J(AJl U J ^) 

> <j s {A Jo )a r (U J °\- h ) >0 

where (a) and (b) follow from Lemma lA~3l which provides a lower bound on the minimum singular value of the product, and 
Lemma |A~2l respectively and the last step follows from the assumption that Aj has full column rank and krank(([/ /o )*) = r, 
which holds by Lemma |A.4| because Xq° is row-nondegenerate. ■ 



C. Proof of Proposition 16.71 

The proof of Proposition 16. II is based on the following theorem by Davidson and Szarek l57l . 

Theorem A. 5 f/ |57l Theorem 11.13]): Given m,n £ N with m < n, consider the random matrix G £ M. nxm whose entries 
are i.i.d. Gaussian following Af(0, —). Then, for any t > 0, 

p(oi(G0>l + V /f + t) <exp(-^ 
p(v m (G)<l-^-t) < ex P 

Proof of Proposition 16.71 Note that ( 16.4b implies 



8 + 1 



1 - Vl-S > VT+S-1 > 2 

V m 

Let j E [n] \ J. Theorem IA.5I implies 

p(WA/um) > VT+s) 
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and 



(a s+l (A JU{j} ) < Vl -<*) 



Since 



it follows that 



s + 1 . s + 1 



i - VT^5- \l > VT+S-i - \ > o, 

m V tyi 



\ A *Ju{j} A Ju{ 3 } -Is+i\\ > S 



< 2exp -— Vl + S-1 - 



s + 1 



2 \ V m , 

By considering the union of the events (ll^/u^-} A/u{j} — / s +i | > S) for all j <G [n] \ J, we obtain 

V{5^(A;J)>6) 



< 2(n - s) cxp ( -— VT+S - 1 - 



s + 1 



The RHS of JA.5t is bounded from above by e if 



(VT+5- l)Vm > Vs+1 + y21n 

which is implied by 



2(n - s) 



- l)Vm > V2^ (s + 1) + 2 In ^ ^ 



(A.5) 



(A.6) 



where we used the concavity of the square root function. Noting that iA.6i coincides with ( 16.4b completes the proof. 



D. Proof of Proposition 16.51 

Let j <G [n] \ J. Theorem IA.5I implies 



• (<T s+ l{A JU{j} ) < l- 7 ) 
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By considering the union of the events (er s +i (A/u-{j}) < 1 — 7) f° r a U J S [ n ] \ ^> we obtain 

P(o^f(A;J)<l- 7 ) 



m / ls + 1 
<{n- sjexp -— 7- 



Condition ( 16.8b implies that the RHS of (IA.7I ) is bounded from above by e. 



(A.7) 



E. Proof of Proposition 16.91 

Let j E [n] \ J. Then, by ll55l Lemma 2.1], Aj\ju\ satisfies 



>(\\A* JUU} A JU{J} - I s+1 \\ >6) 



< 2(s + 1) exp 



m 



S 2 



s 2(1 + 5/3), 

Therefore, by considering the union of the events (ll^juijj^jufj} ~~ Is+i\\ > S) for all j E [n] \ J, we obtain 

*:if(A; J) > 6) 

s 2(1 + 5/3 

Since ( 16.1 II ) implies that the RHS of (I A.8t is less than e, the proof is complete. 



< 2(n - s)(s + l)exp 



F. Proof of Proposition \6.12\ 

Let j e [re] \ J. First, we derive a probabilistic upper bound on ll^jy/j-iA/ulj} — I! where the probability is with respect 



to the choice of J. To this end, we use the relevant result in 16611. M66l Theorem 12] claims that, for s > 3 and a > 1, 

(A.9) 



n\\A* JU{j} A JU{j} -I s+1 \\ >5)< (i±T) 



if 



^14V( S + 1) In (i±± + l) a + ^±H||A!| 2 < e" 1 / 4 * 



(A.10) 



We assumed that p, < and \\A\\ = To bound the RHS of COl i from above by let 

lnf— ) 



By the concavity of the square root function, Condition dA. lOt is implied by 

144A' 2 (s + l)ln(^±I + l) ln fn-s\ | 2(s + l) 
m In (IE) n 



<5 2 

£ V5' <A "> 



Since '"^^ < 2 for all s > 3, ( TaTO is implied by 



(s + l) + 288if 2 In 
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By considering the union of the events (II ^4*/^-} A/u{j} — ^a+ill — <0 f° r a ^ 3 e N \ J> we obtain 

V(5^f(A; J)>5)< Y, n\\A*j u{3} A JU{3} - I s+1 \\ < e. 

je[n]\J 

G. Proof of Theorem \7.1\ 
MUSIC finds Jo if 



K||2 I -PoOfc |2 

mm — -f 2 — - — > max 

k&Jo \\a k 2 fce[n]\J afc 



(A. 12) 



Since a*?^(yl; Jo) > 0, it follows that all columns of Aj Q are linearly independent. Furthermore, since rank(X /o ) = s, we 



have 



S = "fc(4/ ). 



(A. 13) 



By the triangle inequality, 



\P§^h \\Psa k \l 



< 



hkh \\a k h 

MP§-Ps)akh 



l^s-^ll <'?• 



(A. 14) 



Then, for all k G Jo, 



"ii — - ~1 — ii 

a.fc||2 ll a fc||2 

<b) \\Pn(Aj )akh 



\a k h 



1 - 77, 



where (a) and (b) follow from ( IA. 141 > and ( IA.131 , respectively. Then, we obtain a lower bound on the LHS of ( IA. 1 2t given 
by 

ll P 5«fcll2 



mm ■ 

k&Jo \\ak\\2 



>l-r). 



(A. 15) 



Similarly, by ( IA.13I ) and ( IA.14I I. we have 



\\P§ a kh < H P ^(A Jo) afc|| 



ll«fe||2 ||afc||2 
for all k £ [n] \ Jo where || ^^(a 7o ) a & || 2 is further bounded from above by 

\\Pn ( A Jo) ak\\l _ ll^(A Jo )°fclll 
hk\\l \\a k \\l 

Kill 



^ x _ ^+l(^JoU{fc}) 



Ofclla 
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a* III 



< 



i {<ff(A;J )} 



111,00 



where (c) holds by Lemma lA. 21 and (d) follows by the definition of the weak-1 asymmetric RIP. Then, we obtain an upper 
bound on the RHS of dA.12b given by 

1/2 



max || Psa fe || 2 < 

ke[n]\J b 



{<;f(AJ )} 

J- ii ..II') 



-4*lll,oo 



(A. 16) 



Combining ( |A.15t and ( IA.16I ), we note that MUSIC is guaranteed if A satisfies the weak-1 asymmetric RIP given by 

a^(A;J )>a 

for a > satisfying 



a>Pl 2 ,co{l-(l-2r7) 2 } 



1/2 



H. Proof of Proposition \7.6\ 

Let d = dim(P^ j4/ ^5). By the row-nondegeneracy condition on Xq° and Lemma lA.41 it follows that d > 1. There exists 
Q e K( s - fc ) xd where fc = | J| such that Q*Q = I d and K(A Jo \jQ) is a subspace of S. Then, it follows that 

Vd(Pn(Aj)Aj \jQ) = a d {P^ {Aj) P s A JoX jQ) 
(*) 

< ^(P^ (Aj) Ps) • ||A JoU Q|| 

<^(P^ (A/) Ps)-||A/ ||. (A.17) 
By the variational characterization of the singular values, (*) is bounded from below by 

(*) > o- s - k (P£ (Aj) A MJ ) > a s (A Jo ) (A.18) 



where the last step follows by Lemma IA. 2 1 

Let k = ai(Aj )/a s (Aj ). Combining JA. 17b and ( |A.18t , we obtain 

Vd{Pk {Aj) P S ) > K- 1 . (A.19) 

Since dim(S) = dim(5) = r, it follows that 

\\P§-P S \\= sup inf ||i-i|| 2 

\\x\\ 2 = l ||.T||2 = 1 

= sup inf \\x-x\\ 2 (A.20) 

||*||,=1 Nl^ 1 
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and hence ||Pg — P§\\ < i] implies the followings: for all x G S, there exists x G S such that \\x — x\\2 < ?7||5||2- Similarly, 
for all x G S, there exists x G S such that ||i — x\\2 < ??||^||2- 

The following identity is well known (see e.g., ||74]| ): given two subspace Si and 5*2, 



|P Sl - P S J = max{||P^P S2 ||, \\P^P Sl \\}. (A.21) 



Note, by (IA.2U , that 



\Pp± q Pp 1 ^ SI 

TZ (Aj) TZ (Aj) 



(*) 



i^l 5 P pi S \\\. (A.23) 



(**) 

First, we derive an upper bound on (*), which is equivalently rewritten as 



(*)= sup inf Jz-y\\ 2 . (A.24) 



Il*ll2=l 



Let z G Pj^, Aj ^S satisfy ||z|| 2 = 1. Then, by ( |A.19t , there exists x G S such that Pk(Aj) x = z an< ^ INI 2 — K - ®y ^ e 
argument following dA.20b . there exists x G S such that ||x — x||2 < ^INh < V K - Then, it follows that 

II p n(A.,)X - P-k{A.,)X h < \\x -&h 



and hence 



inf \\z — y\\2 < r]K. 
Since z was an arbitrary unit-norm element in P^, A sS, we obtain 

(*) < (A.25) 
Next, we derive an upper bound on (**), which is equivalently rewritten as 

(**) = sup inf _\\z-y\\ 2 . (A.26) 

11*112=1 

Since 



<MPn(Aj)Ps) + \\ p s- p s\\ 
<^d(P^ (Aj) P§) + V 



S) 
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by (1A.19I I, we obtain a lower bound on ^diPj^^^Pg) given by 

°d( P n Aj )Ps) >^-V= (A.27) 



Let z e P-ji(Ai)S sat i s fy INh = 1. Then, by (IA.27I I. there exists x € S such that P^, Aj ^x — z and ||x||2 < \- VK - By the 
argument following ( IA.20I ). there exists x € S such that ||i — £||2<?/||i||2< ill K ■ Then, it follows that 

II P K(Aj)& - P K(A. 7 )^ II =3 < ||* - %h < 



1 — rjK 



and hence 

inf \\z-y\\ 2 <- 

y£Pi (Aj) s 1 - m 

Since z was an arbitrary unit-norm element in P^, Aj sS, we obtain 

rjn 
1 — rjK 



(**) < ^ . (A.28) 



Applying ( lA.25b and ( IA.28t to (IA.23I ) completes the proof. 



/. Proof of Theorem \7.7\ 

MUSIC applied to S is successful if 

. \\P§akh „ Il p s"fcll2 
mm — > max — . (A.29) 

k€Ja\Ji ||Ofc||2 fce[n]\J ||Ofc||2 

Since ( 17. 5t implies rank(Aj- ) = s and JT^° is row-nondegenerate, by Proposition 15.61 we obtain 

S + n(Aj 1 )=n{A Jo ) (A.30) 

and hence, by the projection update formula, 

p n{A Jo ) = Pr.(Aj 1 ) + Pp£ §■ ( A -31) 
Since the augmented subspace is given by 5 = S + TZ(Aj x ), by the projection update formula, we have 

p§ = PnA Jl )+P P ^ Aji)3 - c A -32) 

By the triangle inequality, it follows that 



l P S a fc|| 2 \\ P K(A., ) a k U2 



||a fc || 2 || a-fc || 2 

\\( P s - P K(A., ))ak\\ 2 



< 

\\akh 
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By (|A.301 > we continue by 

[[(-^(AjJ+g" P -R.(A J:L ) + g) a fc|| 2 
|K||2 

\\(Pp± $-Pp->- S) a k\\n 

^ °pl q — °p4 5 

~ ct s (Aj ) - r?o-i(A Jo ) 

< (A.33) 

where (a) follows from (IA.3U and (IA.32b . and (b) follows from Proposition 17.61 because a s (Aj a ) > r)ai(Aj ) is implied by 
(1731 and ( f7~6b . 

By dA33l , it holds for all k G J \ Ji that 



afclla || fife || 2 a-r)(3 

= 1 11(3 



a — r)/3 

This yields a lower bound on the LHS of ( IA.29I ) given by 



' g |3 > 1 - tH a . (A.34) 



fceJoVi 1 1 «fc 1 1 2 a-vP 



Similarly, by ( IA.33I ), it holds for all k G [n] \ Jo that 



\\Psakh < \\Pn(Aj )akh t rj/3 



II a fc || 2 1 1 «,fe 1 1 2 a-riP 

" v ' 

(*) 

where (*) is further bounded from above by 

Kill - \\ p n(A jQ ) a k\\l 



(*) 



2 



Kill 

^( P 7i(A Jo ) a fc) 

Kill 

M j _ g s 2 +l(^J U{fc}) 



= 1 - 



\a k \\l 



< 1 

< 1 



IMI1 



|afc||| 

where (c) follows by Lemma IA.2I This yields an upper bound on the RHS of JA.29I ) given by 



max || -Psa*; || 9 < A /l — .. 1 , . . ''" lv " — -. (A.35) 



4X 



Finally, by applying the bounds in ( IA.331 > and (IA.34I I to (|A.291 l, we note that jA.29\ is implied by the weak-1 asymmetric 

RIP given by 

i _ -q/3 > a 2 i 



a-vP Y 11-4*11 2,00 a-r}P 
which is equivalent to ( 17.61 ). This completes the proof. 

J. Proof of Proposition \7.11\ 

The following lemma is used in the proof of Proposition 17.111 

Lemma A.6: Suppose that A G K mxn and P G K nxn is an orthogonal projector in K". Then, for all x,y G K n , 

\(APx,APy)\-\(Px,Py)\ < \\PA*AP - P\\ ■ \\x\\ 2 ■ \\y\\ 2 . (A.36) 



Proof of Proposition \7.11\ Given J C J , the next step of M-OMP will be also successful if 



.max \\a*P n{Aj) Y\\ 2 > . max . \\a*P n(Aj) Y\\ 2 . 
We derive a sufficient condition for ( IA.37b . First, we derive a lower bound of the LHS of ( IA.37I ). 



(A.37) 



A-ineq 

> .max^ \\a*P^ (A]) Aj a X^h - h^A^h 

> ^MPn(A r) Aj Xo°h - II^W P 7Wkoo||Wll 



> .max 7 ||a*P^ (Aj) A Jo X J ° || a , - || A* 



2,00 



\\w\\ 



(A.38) 



Let Ilj G K" XTl denote the coordinate projection that satisfies ILjej = ej for j G J and Il/e^ = for j G [n] \ J. Then, for 
all j G Jo \ J> we bound the term (*) in dA.38b from below by 



(*) = Wa*P^ Aj) A Jo \jX^ J \ 



= sup 
= sup 



J \J, 



sup 

ll*l|2 = lv. 



( a j> p n(A.,) A J \J x o 
( p n(A., ) a j > p n(Aj ) ^ J \ J X o ^ J z ) 
{Pn( Aj )A^j \je 3 , P£ {Aj) ATlj \jX Q z) 



(A.39) 



(**) 



Note that 



(a) 



(**) > |(n Jo \jej, U Jo \jX z)\ 
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- \\Hj \jA*P^ {Aj) AUj \j - u JoV \\ 
■ ||n MJ eJ 2 ||n /(AJ X z|| 2 
= \(Hj \jej, IL Jo \jX z}\ 

~ W A Jo\J P K(Aj) A Ja\J ~ I\J \J\\\ \\ U Ja\J X O z h 

( & ) , 

> |(n JoV e J; U Jo \jX z)\ - S\\Uj \jX z\\ 2 (A.40) 



where (a) follows from Lemma IA.6I and (b) holds since 



\\ A Jo\J P K(Aj) A Jo\J ~ T \Jo\J\W 

= maxjl - X\j \j\(A* JoX jP£ (Aj) Aj \j), 

Ai^mj^^oV)- 1 } 
< max{l - \s(A* Jo A Jo ) 7 A^AjJ - l} 

= ||A^ Jo -I 8 || <<S 



where (c) follows from Lemma |A. 21 and (d) is implied by (|7.9l l. 
We then continue (|A.39t by using ( |A.40t 



(*) > sup |(IIj- \jej, n Jo \jX 2:)| 

II ^ || 2 = 1 

— sup <5||II/ \,/Xoz||2 

II -^11 2 = 1 

= l|ejnj \jXo|| 2 - 5\\n Jo \jX \\ 

= \\e*IL JoV X \\ 2 -6\\X^ J \\. (A.41) 



Maximizing the lower bound on (*) in (|A.411 > over j e Jo \ J yields 

.max ; ||a*P^ (Aj) A Jo X /o || 2 

> max ||ejn JoV Xo|| 2 -<J||^o oV ll 

J 6 Jo \ J 

= ||* W |koo- <S||X MJ || 

and hence the LHS of (IA.371 > is bounded from below by 

>\\X^ J \\ 2 , 0O -6\\X^ J \\-\\A*\\ 2 , 0O \\W\\. 
Next, we derive an upper bound on the RHS of ( |A.37| >. 



i» l|0 ^) y||a 



(A.42) 
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A-ineq 

< .^ X , ll i i w(A J )^ ^o lla + l|o^(A J )^lla 

j£{n\\J 

< max llo^^Ajo^lla + HP^jOjllallWll 

< max Js- p w(A,)4/„*o°ll2 +||^*||2,oo||W||. (A.43) 

J G [n] \ Jo ■" , 



(*) 

Similarly to the previous case, we derive an upper bound on (*) for all j e [n] \ Jq. 

(*) = \\e*A*P£ iAj) Aj Xt°\\ 2 
= \\e*A*p£ iAj) AX \\ 2 

= \\ e j U (J \J)U{j}A*PK(Aj) AU (Jo\J)U{]} X oh 



sup 

IWI»=i 



( p n(Aj)AR(J \J)u{j} e h 
P K(Aj) AIl (J \J)u{j}Xnz) 



By Lemma IA.6I the last term is at most 



< sup \(n-(j \j)u{j}ej, n-(J \J)u{j}X z)\ 

M|2 = l 

+ S UP \\ A (J \J)U{j} A (Jo\J)U{3} ~ I s-\J\+l\\ 

II 2=1 

' ll n (./ \J)u{i}^02;||2 

= sup |(ej, X z)\ 

Z 2 1 V V ' 

+ SUp ||^*j \J)u{j}^(Jo\J)U{j} -Is-IJI+lll 

Il 2 ll2 = l 

' ll n (J \J)u{i}^0^||2 

= W A *J \J)u{j} A {J \J)u{j} 

■ ll n (Jo\J)U{j}^o|| 

= \\ A ij \j)i>u} A (Jo\j)u{j} -4-iJi+iii • n^o uV/ i 



<PW}^ou{,-}-^+ill-IK 

<S\\xJ oXJ \\. (A.44) 
Maximizing the upper bound on (★) in (I A. 44b yields an upper bound on the RHS of ( IA.37t given by 



max \\a*P n(A .YW2 
je[n]\J 3 K[Aj> 

<S\\X^ J \\ + \\A*\\ 2 ^\\W\\. 



(A.45) 
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Applying the bounds in ( |A.421 > and ( |A.451 > to ( |A.37t , we conclude that, for the success of the next step, it suffices to satisfy 

||*o JoV lkoc - 2S\\X^ J \\ > 2||A*|| ai00 ||W||, 

which is Condition ( 17.10b . This completes the proof. ■ 
Proof of Lemma \A.6\ The proof follows from the properties of an inner product: 



\(APx,APy)\-\(Px,Py)\ 

(a) 

< \(APx,APy) - (Px,Py)\ 
®\(x,PA*APy}-(x,Py}\ 
= \(x,(PA*AP-P)y)\ 

< \\PA*AP-P\\ \\x\\ 2 \\y\\ 2 

where (a) follows from the triangle inequality, (b) follows since from the idempotence of P. ■ 
K. Proof of Corollary^7l5\ 

Recall that SS-OMP is M-OMP applied to P§ instead of Y. Since we consider the first k steps of SS-OMP, we assume that 
J C Jo satisfies |J| < k — 1. Since (17. 1U implies \\Aj Aj —I s \\ < 1, all columns of Aj are linearly independent. Therefore, 
the projection Pg is written as 



P 3 = A Jo $($*A*j o A Jo <S>) $*A%. 



Since $*$ = I r , it follows that 



\\$*A* Jo A Jo <f>\\ 

= ||$*|| •||^ A Jo ||. ||*|| 

<\\A%A Jo \\ 

= 11 A/o f (A.46) 



and 



a r (^*A* Jo Aj ^) 
>a s (A* Jo A Jo ) 

= <j 2 s (Aj ) (A.47) 



where the inequality follows from the variational characterization of the eigenvalue since $$* is a projection onto an r- 
dimensional subspace. 
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Note that 



maX {l-<7 s 2 (Ajo), IIA/o || 2 -1} 
= \\A* Jo A Jo -I s \\ 

< 5:^(A; J ) < 6 (A.48) 



where (a) follows from ( 17.111 ). 
Let T = (<f>*A* Jo A Jo <py 1/2 . Then, by COTl i and (KM) . 



Ai(T) < — ±— < 



°s{A Ja ) VT^S 
and similarly, by JA.46I ) and ( |A.48t , 

Ar(T) - > 7m- 

Let U G K nxr satisfy C/MVo = and U J ° = Then, P§ is also written as 

P§ = A{UTV*) 
where V is given by V — Aj QT and satisfies V*V = I r . 

\Ps-Ps\\<V, 



Since Pg satisfies 



we can write Pg as 



P § = A{UTV*) + Z 



for some Z satisfying \\Z\\ < rj. 

The following two inequalities applied to Proposition 17. Ill complete the proof. 

\\u J °\ J rv*\\ = ||c/' /oV7 t|| 

< \\UT\\ < \\U\\ ||T|| 
= 11*11 ITK^ 

and 

||l/ J »\ J T7*||2,oo = ||C/ JoV T|j 2 ,oo 

> \\U J ° XJ h,oc > P\J\ + 1* 

VTTs ~ ViTs 



> 



VT+S 

where the last step follows since | J\ < k — 1. 
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L. Proof of Lemma \7.17\ 

By the Cauchy-Binet formula, it follows that 



1 = det($*$) 
= Yj dct(($ J )*$ J ) 

Je[s],\J\=r 

= |dct($ J )| 2 . 

Je[s],\J\=r 



The determinant of $ J is bounded from above by 



|dct($' 7 )| < n ||$ J efc || 2 < (-Ell* J e fc ||!j 

k=l \ k=l / 



'■/'I 



where the first inequality follows from Hadamard's inequality and the second inequality follows from the inequality of arithmetic 
and geometric means. Therefore, 

2r/q 



which implies 



Note that 



1< E {±£\\* J **\\l\ 

j£[s],\J\=r \ k=l 



< 



max 



2r/q 



r\(s - r)\ je[s],\j\=r yr ^ 



efclll 



_^__fl^(p fc ($)) 9 



2r/q 



r\(s — r)\ \r 



k=l 



| \ -9/2r 



r!(s — r)\ 



< 



-E(p fc ($)) 9 - 



k=l 



(A.49) 



5>k(*)) 9 <5>*(*))* + E (M*))' 

j=l k=l k=s-r+l 



< (a-r) + (2r-a)(p,_ r ($))«. 



Combining ( |A.49t and (|A.50b , we obtain 



p 8 -r($) > 



r!(s — r)! 



2 - 



V 



M. Proof of Proposition \7.19\ 

Assume that J C J is given from the previous steps where k = \ J\ < s. 



Define 



1i 



A P -R(A.j) a 3 



(A.50) 



(A.51) 



(A.52) 
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for j G [n\. Then, ||<7j||2 = 1 for all j G [n]. 

For the guarantee of the next step of SS-OMSP, we need to show that 



max \\{P P ± s)qj\\ > max \\(P p ± §)qj\\. (A.53) 

jeJ \J p mAj) bru " je[n]\J p nuj) b '* J " 



By the triangle inequality, it follows that 



< || (Ppx 5- p P x s)*|| 

- ||P ^ 7 ,^~ jP ^(A J ,5ll 



a — 77/? 

where the last step follows from Proposition 17.61 since JT,^ is row-nondegenerate and ( 17.20b implies a > s/fj/3. 



Then, JA.53b is implied by 



IK P ^ (4 ,s)?;ll > . max ||(Ppx + — (A.54) 



Let Q = [51, ... , g n ] G K* nxn where g 3 for each j G [n] is defined in ( IA.52I ). Then, Qj \j satisfies 

,n \ ^ a s-^ p -k(A.,) A J a \j) 
max je . /oV/ ||P K(j4/) a 3 |j 2 

W a s (A/ ) 



max je./o\J Kll 

- ||^*||a,oo 
(c) a 

> Trmi ( A -55) 

A* 2.00 



where (a) and (b) follow by Lemma |A. 21 and Lemma |A. II respectively, and (c) is implied by ( 17.19l >. 
First, we bound the LHS of (IA.54I ) from below by 

max J (/V s)qj\\ 

= max MPpx s \\ 
jeJo\J K(Aj> 

= \\Qj \J P P£ {Aj) sh;°° 
> \\Q*j \j p p£ (Aj) s\\f 



V 's — k 

,dim(P^ (A/) S) 



<g ( Sm'""""^!-wAw) 1 (A . 56) 



(*) 

where (d) follows from the variational characterization of the Ky-Fan norm l72l . For the special case when r = s, S is given 
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by S = S = K{A Jo ) and hence 

&™(Pn(Aj)S) = ™M p n(Aj) A Jo\j) = s - fc > (A.57) 



which follows since cr s _k(Pj£, A sAj \j) > a s (Aj ) > by Lemma [A. 21 In this case, the RHS of ( |A.56t reduces to 

(*) 



YfiJx Vs-k+i-e(Qjo\j)\ 



s — k J 

\Qj a \j\\W /2 _(E^hMY /2 



s — k J \ s — k 

Otherwise, if r < s, we derive a lower bound on the RHS of ( |A.56t by 



.|A*|| a ,oo V s-\J. 
where (e) follows from ( IA.55I ). 
Next, we derive an upper bound on the RHS of dA.54b . Assume that j £ [n] \ Jo- Since 



x V 1 ' 2 

A J \J P K(Aj) A Jo\J 



and 



* = Pn {Aj ) A J \jR- 



Then, $ is an orthonormal basis for H{P^ a ^Aj \j) 
Since Aj has full column rank, we can also define 



* 4 Aj(A* J Aj)- 1 ' 2 , 



Then, ^ is an orthonormal basis for TZ(Aj). 
We bound the RHS of jA.60\ from above by 



= 11***112 



(A.58) 



> / im(P ^^ } (A.59) 



Pk( Aj )S C P£ iAj) K(A Jo ) 
□x 

it follows that 

11(^^)^112 < \\{Pn { P^A Jo ^h- (A.60) 
Since P^, A ^Aj \j has full column rank by dA.57b , we can define 
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\[qj>®}*PK( Aj )[<lj,*}\\-l} 
< maxjl - A s ([9j,*,*]*[gj, *,*]), 

Ifc-,*,*]*!^,*,*]!-!} 

= [^,$,\&]*fe ) $,*]-J H .i 



(A.61) 



where (f) and (g) follow from Lemma I A. 1 1 and Lemma IA.21 respectively. 
Note that [$, \I>] is an orthonormal basis of TZ{Aj a ). Therefore, the RHS of dA.6U is bounded from above by 

g,-, -I s +i 




|| ill 



I [*>*]*«,• 111 



= ll[ ( f>,*]**ll2 

= II^K(A Jq )9j||2 

= 1 _ ll^7Z.(u4j )?7'lll 
H^7i(Aj ) a jll2 







= 1 - 



(hi 

< 



-, _ °"s+l(^JoU{j}) 



x jll2 



2l 



l^llioc 



where (h) follows from Lemma IA.2I and (i) is implied by ( 17.191 ). 
Since j was arbitrary in [n] \ Jo, dA.6U implies that 



max 



\\(Pn(Pi lA7 ,A JoXJ ))Qjh<Jl- 



I^IILo' 



j£[n]\j"' '"^iz(Aj) /i Jo\J 

Applying ( lA.59b , and ( lA.62b to ( lA.54b . we obtain the following sufficient condition for ( IA. 53b 

277/3 



(A.62) 



' dim ( P K(Aj) S ) 



- 4 1- 



l^lll.oo - 



V S-|J| ||^*||2,oo 

which is equivalent to ( 17.20b . For the full row rank case, applying ( 1A.58I I. and ( IA.62I I to ( IA.54I ). we obtain Condition ( 17.211 ). 
which implies (IA. 53b . 
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N. Proof of Proposition \8. 1 1 

To prove Proposition 18.11 we use the following lemmata. The proof of Lemma IA.8I is deferred after the proof of Proposi- 
tion o 

Lemma A.7 (A case of l[72\ Theorem VII.3.3]): For positive semi-definite matrices Ti, T-2 <E K mxm and r € {1, . . . , m— 1}, 
let Si and 52 be the subspaces spanned by the r-dominant eigenvectors of Ti and I^, respectively. Then, 

31|r x - r a || > HP^PsJ • [A r (r 2 ) - A^ro] . 



Lemma A. 8: Given m, n6N with n > 2m, consider the random matrix G E C" xm whose entries are i.i.d. circular Gaussian 
variables with zero mean and variance -. Then, 



for t > satisfying 



2m t 
n 3 



< 1. 



Proof of Proposition \8.1\ Recall that, given the sample covariance matrix Ty — ^jr~ of the snapshots Y , Algorithm 1 



computes T defined by 



Also recall that Ts is defined by 



r — Ty — X m (Ty)I n 



T s 



TV 



N 



Let D £ ]g mxm denote the distortion matrix defined by 



Then, D is decomposed as 



where 



d = r - r.< 



D ^noise ~t~ Across H~ ^bias 



A WW* 2 



TV 



Z J 

w in ' 



n — 

cross — 



A Ja X^W* , W(X Jo )M} 



A/ 



A 



Arias = - A m (Ty)] In 
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It follows that the sample covariance matrix Ty is decomposed as Ty = + cr^/ m + -Dci-oss + -Dnoise- Viewing Ty as a 
perturbed version of T3 + <^I m with distortion D nolse + D cmss , we bound the perturbation in the m-th eigenvalue by Weyl's 
Theorem as 

|A m (ry)-A m (r s + ^/ m )| 
v . ' 

(*) 

< py-r s -0*i m || 

— ||-^noise H - Across || 

— ||-^noise|| H - ||-^cross||- 



Now, since rank(T5) < s and s < m, (*) reduces to 



K - X m (T Y )\ = \\D hms \\. 



Therefore, ll-Dbiasll satisfies 



I Arias || < \\D D 



-Across 



For k G N where k < N, let Z fe G 



-fcx JV 



be an i.i.d. Gaussian matrix such that KZi. = and E 



z k (z k 

N 



= lb- Then, we 



define for k = 1, . . . , N — la family of random variables 

Sfe = 

Using we bound the distortion terms from above by 

WW* 



z k (z k 



N 



h 



\D„ 



(JIN 



and 



I A 



<2a w \\A Jo ^K 
< 2a w \{ /2 (r) 



<z>w* 



<J,„N 



i r 

N 



'm+M 



where Xi ~ X 2 denotes that random variables Xi and X 2 have the same distribution. Therefore, for any fixed c > 0, since 
P(£/c < c) decreases in k, it follows that 



\D\\ < c) > P (2£ m a 2 + 2e„ i+s( T„A; /2 (r) < 
> P (2£ m+s a w \a w + 2Ai /2 (r)l < c 



(A.63) 
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Rank estimation From the sample covariance matrix V, Algorithm 1 determines the rank r as the maximal number k that 
satisfies 

Afc(f)-Afc+i(f) >TA x (f). 



Therefore, to guarantee that Algorithm Q] stops at the desired rank r, we need sufficient conditions for 

A r (f)-A r+ i(f) >rAi(f), 
A fe (f)-A fc+ i(f)<rAi(f), Vfc>r. 

By Weyl's perturbation theorem |72l Corollary III. 2. 6], it follows that 



max A fc (r)-A fc (r s ) < \\T-T S \ 

fc£[m] 



and hence 



A r (r)-A r+ i(r) > A r (r s ) - A r+1 (r s ) - 2||d| 



Ar(r) Ai(r s ) + ||D|| 

Again, by Weyl's perturbation theorem ll72l Corollary III. 2. 6], it follows that 



<b<b 

Afc ( — ) -A fc (/ M ) 



N 



< 



N 



Im 



for all k = 1, . . . , M and hence 



By ( IA.67I ). it follows that 



/<j><j>*\ /<j><j>*' 
1-£m< Am < Ai — — ) < 1 + 6/ 



V N 



N 



(A.64) 
(A.65) 



(A.66) 



(A.67) 



A fc (r s ) > (i-6/)A fc (r) 
A fc (r s ) < (i + £ M )A fe (r). 



(A.68) 
(A.69) 



By Lemma IA. 81 it follows that 



&> 6 ^ + g 5 ^|< e . 



AT 



(A.70) 



Therefore, by ( 18.4b . ( 18.5b . and dA.70b , it follows that £m < # with probability 1 — e/2. Therefore, we assume £m < # in the 
remaining steps of the proof. 

Since £m < by (IA.66b . ( IA.68b . and ( |A.69t the following condition is a sufficient condition for ( IA.64b : 

(l-fl)A r (r)-(l + fl)A r+ i(r)-2||D|| 



which is equivalent to 



WW < 



(l+e)x 1 (T) + \\D\\ - r ' 

(i - g)A r (r) - (i + g)A r+ i(r) - (i + g)rAi(r) 

2 + r 



(A.71) 
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Since T satisfies d8.lt , we obtain a sufficient condition for JA.7 1| > given by 



\\D\\ < (1+^ (A.72) 



Ai(T) " 2 + r 

Similarly, since T satisfies (18.21 ). we verify that (IA.72I ) is also a sufficient condition for (IA.65I ). 

Recall that Algorithm 1 computes S as the subspace spanned by the r dominant eigenvectors of T, Next, we derive conditions 
that guarantees that ||Pg — P§|| < r/ where S is the subspace spanned by the r dominant eigenvectors of T$- 

We apply Ti = T and T2 = T$ to Lemma |A. 71 and obtain 



3||£|| > \\P§ ~ P§\\ ■ [Ar(T s ) - A r+1 (r s )] . (A.73) 

Since £m < 9 and T satisfies (18. 11 1. 

A r (r s ) - A r+1 (r s ) > (l - 0)A r (r) - (i + 0)A r+1 (r) 

> (l + (9)(l+i/)rAi(r). (A.74) 
By ( IA.73I ) and ( IA.74b . we note that \\P§ — Pg\\ < r] is implied by 

M^^l + flja + i/JrAiCT). (A.75) 

Recall that C n ^.e. T is defined by 

C we , T 4( 1 + ,) Tmin |(i+^ j 

Then, by (lA63l 



2[a.2/A 1 (r)"T2t CT 2 /Al(r)) i/ 2] 



< nr .,n 7^ 7\ T^vTToi (A.76) 



implies jA.72> and ( IA.75D . 

Finally, we note that, by d8H ), d83i and jAJOh it follows that ( |A.76l l holds with probability 1 - e/2. ■ 

Proof of Lemma \A.8\ We use the following lemma to prove Lemma IA.8I 
Lemma A.9 ( K75\ Lemma 36]): Consider a matrix B G ]K™ xm (m < n) that satisfies 

||B*-B-/|| < max(S,5 2 ) 

for some <5 > 0. Then, 

1-6 < o m (B) < ai{B) < 1 + <5. (A.77) 

Conversely, if B satisfies (IA.77b for some 5 > 0, then - I\\ < 3 max(<5, S 2 ). 

We can write G as 

G = — j=(Grz + jGjra) 
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where GR e ,Gi m G R nxm are mutually independent i.i.d. Gaussian matrices whose entries follow Af(0, —). Then, 



where (*) is a block matrix 



whose block entries are given by 



\G*G-L 



GR e — Gl m 
G]m Grs 



GRe —Glm 
Glm GR e 



-I 



2 s 



(*) 

An A12 
A21 A22 



An — - [(G Re GR e — Is) + (G^Glm — I s )] 

A12 = — (G[* m GR e — G Re Gi m ) , 
A21 = A* 12l 



A 22 = A 



11 • 



Then, 



1 

< - 
~ 2 



G*G-I S \\ 

( — GR e )* ( — GRe) — I s ( — GRe)*Gi m 
GJm(~ Gr c ) G* m Gi m — I s 

Im^Im ~ J s <- r i m ( - r Re 
G Re Gi m G Re G Re — I s 

||[ — GRe Gi m ]*[ — GRe Gi m ] _ hs\\ 

2 II [GRe Gim]*[GR e Gi m ] — -?2s| ■ 



(A.78) 



By Theorem IA.5I and Lemma IA.9I we have 



F M| [GRe Gi m ] * [GRe G Im ] - hs \\ > 3^ ^ + t ) 

( nt 2 \ 

<2exp (A.79) 



for t > satisfying 

/2m t 



n 3 



62 



By the symmetry of the Gaussian distribution, we also have 



P || [— Gr £ Gim]*!— Gr c Gi m 



[2m 

hs\\ > 3W — + i 



) 



( nt2 \ 
<2cxp(--j. 



(A.80) 



Combining dA.78t - dA.80l ) completes the proof. 
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